Question Bank Dbms
Question Bank Dbms
(AUTONOMOUS)
QUESTION BANK
Academic Year : 2023-2024 (Even Semester)
Year/ II/IV
Sem :
Course Code AI2214402 & DATABASE MANAGEMENT
&Title : SYSTEMS
Regulation R2022
:
Prepared By Approved
By
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 1
Unit-I
INTRODUCTION
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 2
Part-B ( 13 Marks)
S.No Questions BTL CO
1 Explain three different groups of data models with suitable examples. K2 CO1
Definition: It is a collection of conceptual tools for describing data, relationships
among data, semantics (meaning) of data and constraints.
• Data model is a structure below the database.
• Data model provides a way to describe the design of database at physical, logical
and view level.
• There are various data models used in database systems and these are as follows -
(1) Relational model:
• Relation model consists of collection of tables which stores data and also guilatxo
represents the relationship among the data.
• Table is also known as relation.
• The table contains one or more columns and each column has unique name.
• Each table contains record of particular type, and each record type defines a fixed
number of fields or attributes.
• For example - Following figure shows the relational model by showing the
relationship between Student and Result database. For example - Student Ram lives
in city Chennai and his marks are 78. Thus the relationship between these two
databases is maintained by the SeatNo. Column
Advantages:
(i) Structural Independence: Structural independence is an ability that allows us to
make changes in one database structure without affecting other. The relational levsiz
model have structural independence. Hence making required changes in thedatabase
is convenient in relational database model.
(ii)Conceptual Simplicity: The relational model allows the designer to simply focus
on logical design and not on physical design. Hence relational models are
conceptually simple to understand.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 3
(iii) Query Capability: Using simple query language (such as SQL) user can get
egile information from the database or designer can manipulate the database
structure.
(iv) Easy design,maintenance and usage: The relational models can be designed
logically hence they are easy to maintain and use.
Disadvantages:
(i) Relational model requires powerful hardware and large data storage devices.
(ii) May lead to slower processing time.
(iii) Poorly designed systems lead to poor implementation of database systems.
1) Entity relationship model:
• As the name suggests the entity relationship model uses collection of basic objects
called entities and relationships.
• The entity is a thing or object in the real world.
• The entity relationship model is widely used in database design.
For example - Following is a representation of Entity Relationship modelin which
the relationship works_for is between entities Employee and Department.
Advantages:
i) Simple: It is simple to draw ER diagram when we know entities and relationships.
ii) Easy to understand: The design of ER diagram is very logical and hence they
are easy to design and understand.
iii) Effective: It is effective communication tool.
iv) Integrated: The ER model can be easily integrated with Relational model.
v) Easy conversion: ER model can be converted easily into other type of models.
Disadvantages:
i) Loss of information: While drawing ER model some information can be hidden
or lost.
ii) Limited relationships: The ER model can represent limited relationships as
compared to other models.
iii) No Representation for data manipulation: It is not possible to represent data
manipulation in ER model.
iv) No industry standard: There is no industry standard for notations of ER
diagram.
(3) Object Based Data Model:
• The object oriented languages like C++, Java, C# are becoming the
dominant in software development.
• This led to object based data model.
• To The object based data model combines object oriented features with
relationaldata model.
Advantages:
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 4
i) Enriched modelling: The object based data model has capability of modelling the
real world objects.
ii) Reusability: There are certain features of object oriented design such as
inheritance, polymorphism which help in reusability.
iii) Support for schema evolution: There is a tight coupling between data and b
applications, hence there is strong support for schema evolution.
iv)Improved performance: Using object based data model there can be significant
improvement in performance using object based data model.
Disadvantages:
i) Lack of universal data model: There is no universally agreed data model for an
object based data model, and most models lack a theoretical foundation.
ii) Lack of experience: In comparison with relational database management the use
of object based data model is limited. This model is more dependent on the skilled
egi programmer.
iii) Complex: More functionalities present in object based data model make the
design complex.
(4) Semi-structured data model:
• The semi-structured data model permits the specification of data where individual
data items of same type may have different sets of attributes.
• The Extensible Markup Language (XML) is widely used to represent semi-
structured data model.
Advantages
i) Data is not constrained by fixed schema.
ii) It is flexible.
iii) It is portable.
Disadvantages
i) Queries are less efficient than other types of data model.
2 Elaborate in detail about the overall architecture of database system in detail. K2 C01
It shows application interfaces used by naïve users, application programs created by
application programmers, query tools used by sophisticated users and administration
tools used by database administrator.
• The lowest part of the architecture is for disk storage.
• The two important components of database architecture are - Query processor and
storage manager.
Query processor:
The interactive query processor helps the database system to simplify and facilitate
access to data. It consists of DDL interpreter, DML compiler and query evaluation
engine.
With the following components of query processor, various functionalities are
performed -
i) DDL interpreter: This is basically a translator which interprets the DDL
statements in data dictionaries.
ii) DML compiler: It translates DML statements query language into an evaluation
plan. This plan consists of the instructions which query evaluation engine
understands.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 5
iii) Query evaluation engine: It executes the low-level instructions generated by the
DML compiler.
• When a user issues a query, the parsed query is presented to a query optimizer,
which uses information about how the data is stored to produce an efficient
execution plan for evaluating the query. An execution plan is a blueprint for
evaluating a query. It is evaluated by query evaluation engine.
Storage manager:
• Storage manager is the component of database system that provides interface
between the low level data stored in the database and the application programs and
queries submitted to the system.
• The storage manager is responsible for storing, retrieving, and updating datain the
database. The storage manager components include -
i) Authorization and integrity manager: Validates the users who want to access
the data and tests for integrity constraints.
ii) Transaction manager: Ensures that the database remains in consistent despite of
system failures and concurrent transaction execution proceeds without conflicting.
iii) File manager: Manages allocation of space on disk storage and representation of
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 6
the information on disk.
iv) Buffer manager: Manages the fetching of data from disk storage into main
memory. The buffer manager also decides what data to cache in main memory.
Buffer manager is a crucial part of database system.
• Storage manager implements several data structures such as -
i) Data files: Used for storing database itself.
ii) Data dictionary: Used for storing metadata, particularly schema of database.
iii) Indices: Indices are used to provide fast access to data items present in the
database
Abstract view of the system is a view in which the system hides certain details of
how the data are stored and maintained.
he main purpose of database systems is to provide users with abstract view of the
data.
The view of the system helps the user to retrieve data efficiently.
For simplifying the user interaction with the system there are several levels of
abstraction - these levels are - Physical level, logical level and view level.
Data Abstraction
Data abstraction: Data abstraction means retrieving only required amount of
information /of the system and hiding background details.
There are several levels of abstraction that simplify the user interactions with the
system. These are
1) Physical level:
2) Logical level:
• This is the next higher level, which describes the what data are stored in database.
• The logical level thus describes then entire database in terms of small number of
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 7
relatively simple structures.
• The database administrators use logical level of abstraction for deciding what
information to keep in database.
3) View level:
This is highest level of abstraction that describes only part of the entire database.
• The view level can provide the access to only part of the database.
• Clerk at the reservation system, can see only part of the database and can access the
required information of the passenger.
empID:numeric(10)
empname:char(20)
dept_no:numeric (10)
salary:numeric(8,2)
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 8
end
This code defines a new record employee with four fields. Each field is associated
with field name and its type. There are several other records such as
• At the physical level, the record - customer, employee, department can be Vibe
described as block of consecutive storage locations. Many database systems hide
lowest level storage details from database programmer.
• The type definition of the records is decided at the logical level. The programmer
work of the record at this level, similarly database administrators also work at this
level of abstraction.
• There is specific view of the record is allowed at the view level. For instance - -
customer can view the name of the employee, or id of the employee but cannot
access employee's salary.
Instances: When information is inserted or deleted from the database then the
database gets changed. The collection of information at particular moment is called
instances. For example - following is an instance of student database
Types of Schema: The database has several schema based on the levels of
abstraction.
(1) Physical Schema: The physical schema is a database design described at the
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 9
physical level of abstraction.
(2) Logical Schema: The logical schema is a database design at the logical level of
abstraction.
(3) Subschema:A database may have several views at the view level which are
called subschemas.
Database Languages
There are two types of languages supported by database systems. These are
(1) DDL
• It is a language which is used for creating and modifying the structures of tables,
views, indexes and so on.
• Some of the common commands used in DDL are - CREATE, ALTER, DROP.)
The main use of CREATE command is to build a new table. Using ALTER
command, the users can add up some additional column and drop existing columns.
Using DROP command, the user can delete table or view.
(2) DML
• Procedural DML - Require a user to specify what data are needed and how to get
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 10
those data.
• Declarative DML - Require a user to specify what data are needed without of au
aw specifying how to get those data.
• Query is a statement used for requesting the retrieval of information. This retrieval
of information using some specific language is called query language.
Part-C ( 15 Marks)
S.No Questions BTL CO
1 Demonstrate E-R Model in detail with example. K3 CO1
Introduction to Entity Relationship Model
Entity Relational model is a model for identifying entities to be represented in the
database and representation of how those entities are related.
Symbols used in ER Model
Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in a
Database System.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 11
ER Model
The ER data model specifies enterprise schema that represents the overall logical
structure of a database.
The E-R model is very useful in mapping the meanings and interactions of real-world
entities onto a conceptual schema.
The ER model consists of three basic concepts -
1) Entity Sets
• Entity: An entity is an object that exists and is distinguishable from other objects.
For example - Student named "Poonam" is an entity and can be identified by her
name. The entity can be concrete or abstract. The concrete entity can be - Person,
Book, Bank.The abstract entity can be like - holiday, concept entity is represented as
a box.
• Entity set: The entity set is a set of entities of the same types. For example - All
students studying in class X of the School. The entity set need not be disjoint. Each
entity in entity set have the same set of attributes and the set of attributes will
distinguish it from other entity sets. No other entity set will have exactly the same set
of attributes.
2) Relationship Sets
Relationship is an association among two or more entities.
The relationship set is a collection of similar relationships. For example - Following
Fig. 2.1.2 shows the relationship works for for the two entities Employee and
Departments.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 12
The association between entity sets is called as participation. That is, the entity sets
E1, E2,En participate in relationship set R.
The function that an entity plays in a relationship is called that entity's role.
3) Attributes
Attributes define the properties of a data object of entity. For example if student is an
entity, his ID, name, address, date of birth, class are its attributes. The attributes help
in determining the unique entity. Refer Fig. 2.1.3 for Student entity set with attributes
- ID, name, address. Note that entity is shown by rectangular box and attributes are
shown in oval. The primary key is underlined.
Types of Attributes
1) Simple and Composite Attributes:
1) Simple attributes are attributes that are drawn from the atomic value domains
For example - Name = {Parth}; Age = {23}
1) Composite attributes: Attributes that consist of a hierarchy of attributes For
example - Address may consists of "Number", "Street" and "Suburb"→ Address =
{59+ 'JM Road' + 'Shivaji Nagar'}
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 13
3) Derived attribute:
Derived attributes are the attributes that contain values that are calculated from other
attributes. To represent derived attribute there is dotted ellipse inside the solid ellipse.
For example Age can be derived from attribute DateOfBirth. In this situation,
DateOfBirth might be called Stored Attribute.
3. n-ary Relationship: When there are n entities set participating in a relation, the
relationship is called an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is
known as cardinality. Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the
relationship, the cardinality is one-to-one. Let us assume that a male can marry one
female and a female can marry one male. So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
3. Many-to-One: When entities in one entity set can take part only once in the
relationship set and entities in other entity sets can take part more than once in the
relationship set, cardinality is many to one.
Using Sets, it can be represented as:
4. Many-to-Many: When entities in all entity sets can take part more than once in
the relationship cardinality is many to many. the total number of tables that can be
used in this is 3.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 16
UNIT II
DATABASE DESIGN
Introduction to relational databases -Relational Algebra – E-R Diagrams–SQL
fundamentals – Advanced SQL features – Functional Dependencies –Normal forms:
First, Second, Third Normal Forms, Dependency Preservation – Boyce/Codd Normal
Form – Multi-valued Dependencies and Fourth Normal Form – Join Dependencies and
Fifth Normal Form.
Part-A (2MARKS)
1) There is no redundancy of data (all data is stored in only one place), and
2) Data dependencies are logical (all related data items are stored together)
Boyce-Codd normal form is found to be stricter than third normal form Justify the
3 K3 CO2
statement.
(i) Every relation which is in BCNF is also in 3NF but every relation which is in 3NF is not
necessarily be in BCNF.
(ii) BCNF non-transitionally depends on individual candidate key but there is no such
requirement in 3NF.
The relationship set is a collection of similar relationships. For example - Following Fig.
2.1.2 shows the relationship works for for the two entities Employee and Departments.
The association between entity sets is called as participation. That is, the entity sets E1,
E2,.En participate in relationship set R.
The function that an entity plays in a relationship is called that entity's role.
It is a language which is used for creating and modifying the structures of tables, views,
indexes and so on.
Some of the common commands used in DDL are -CREATE, ALTER, DROP.
Part-B ( 13 Marks)
S.No Questions BTL CO
1 Classify various operations in relational algebra with examples. K2 CO2
There are two formal query languages associated with relational model and those are
relational algebra and relational calculus.
Definition: Relational algebra is a procedural query language which is used to access
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 18
database tables to read data in different ways.
• The queries present in the relational algebra are denoted using operators.
• Every operator in relational algebra accepts relational instances (tables) as input and
returns relational instance as output. For example-
Each relational algebra is procedural. That means each relational query describes a step-
by-step procedure for computing the desired answer, based on the order in which
operators are applied in the query.
A sequence of relational algebra operations forms a relational algebra expression,
whose result will also be a relation that represents the result of a database query. The
By composing the operators in relational expressions the complex relation can be
defined.
Relational Operations
Various types of relational operations are as follows-
(1) Selection:
This operation is used to fetch the rows or tuples from the table(relation).
Syntax: The syntax is
σpredicate(relation)
Where σrepresents the select operation. The predicate denotes some logic using which
the data from the relation (table) is selected.
For example - Consider the relation student as follows-
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 19
Query: Fetch students with age more than 18
We can write it in relational algebra as
Σage>>18 (student)
The output will be-
(2)Projection :
Project operation is used to project only a certain set of attributes of a relation. That
means if you want to see only the names all of the students in the Student table, then
you can use Project operation. Thus to display particular column from the relation, the
projection operator is used.It will only project or show the columns or attributes asked
for, and will also vait remove duplicate data from the columns.
Syntax:
ПС1, С2... (r)
where C1, C2 etc. are attribute names(column names).
For example - Consider the Student table given in Fig. 1.13.2.
Query: Display the name and age all the students
This can be written in relational algebra as
Пsname, age (Student)
Above statement will show us only the Name and Age columns for all the rows of data
in Student table.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 20
(3) Cartesian product:
This is used to combine data from two different relations(tables) into one and fetch data
from the combined relation.
Syntax: A x B
For example: Suppose there are two tables named Student and Reserve as follows
Query: Find the names of all the students who have reserved isbn = 005. To satisfy this
query we need to extract data from two table. Hence the cartesian product operator is
used as
(σStudent.sid = Reserve.sid ˄ Reserve.Isbn = 005 (Student × Reserve)
As an output we will get
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 21
• Where A and B are relations.
• For example: If there are two tables student and books as follows-
• Query: We want to display both the student name and book names from both the
tables then
Пsname (Student) U Пbname (Book)
(ii) Intersection:
• This operation is used to fetch data from both tables which is common in both the
tables.
• Syntax: A ∩ B
where A and B are relations.
• Example - Consider two tables - Student and Worker
• Query: If we want to find out the names of the students who are working in a
company then 300
Пname (Student) ∩ Пname (Worker)
(iii) Set-Difference: The result of set difference operation is tuples, which are present
in one relation but are not in the second relation.
Syntax: A - B
For Example: Consider two relations Full_Time_Employee and Part_Time_Employee,
if we want to find out all the employee working for Fulltime, then the set difference
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 22
operator is used -
ПEmpName(Full Time_Employee) – ПEmpName(Part_Time_Employee)
(5) Join:The join operation is used to combine information from two or more relations.
Formally join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products. The join
operator is used as
A) Inner Join
There are three types of joins used in relational algebra
i) Conditional join: This is an operation in which information from two tables is
combined using some condition and this condition is specified along with the join
operator.
A c B = σc (A x B)
Thus is defined to be a cross-product followed by a selection. Note that the condition
c can refer to attributes of both A and B. The condition C can be specified using <,<,>,<
or = operators.
For example consider two table student and reserve as follows-
If we want the names of students with sid(Student) = sid (Reserve) and isbn = 005,then
we can write it using Cartesian product as -
(σ((Student.sid = Reserve.sid) ∩(Reserve.(isbn) =005)) (Student × Reserve))
Here there are two conditions as
i) (Student.sid =Reserve.sid) and ii) (Reserve.isbn = 005) which are joined
by∩operator.
Now we can use c instead of above statement and write it as -
(Student ( Student.sid - Reserve.sid) ˄ (Reserve.(Isbn) - 005) Reserve))
The result will be-
ii) Equijoin: This is a kind of join in which there is equality condition between two
attributes(columns) of relations(tables). For example - If there are two table Book and
Reserve table and we want to find the book which is reserved by the student having isbn
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 23
005 and name of the book is 'DBMS', then :
iii) Natural Join: When there are common columns and we have to equate these
common columns then we use natural join. The symbol for natural join is simply
without any condition. For example, consider two tables-
Now if we want to list the books that are reserved, then that means we want to match
Books.isbn with Reserve.isbn. Hence it will be simply Books Reserve
B) Outer Join
There are three types of outer joins - Left outer join, Right outer join and Full outer
join.
(1) Left Outer join
• This is a type of join in which all the records from left table are returned and the
matched records from the right table gets returned.
• The result is NULL from the right side, if there is no match.
• The symbol used for left outer join is
• This can be graphically represented as follows
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 24
• For example - Consider two tables Student and Course as follows –
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 25
The result of Right outer join will be
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 26
The symbols used are –
(6) Rename operation: This operation is used to rename the output relation for any
query operation which returns result like Select, Project etc. Or to simply rename a
relation(table). The operator ρ(rho) is used for renaming.
Syntax: ρ (RelationNew, RelationOld)
For example: If you want to create a relation Student_names with sid and sname from
Student, it can be done using rename operator as:
ρ(Student_names, (Πsid.sname (Student))
(7) Divide operation:The division operator is used when we have to evaluate queries
which contain the keyword ALL.
It is denoted by A/B where A and B are instances of relation.
Formal Definition of Division Operation: The operation A/B is define as the set of all
x values (in the form of unary tuples) such that for every y value in (a tuple of) B, there
is a tuple <x,y> in A.
For example - Find all the customers having accounts in all the branches. For that
consider two tables - Customer and Account as
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 27
Here We check all the branches from Account table against all the names from
Customer table. We can then find that only customer A has all the accounts in all the
branches.
Describe about the six clauses in the syntax of SQL query and show what type of
2 constructs can be specified in each of the six clauses. Which of the six clauses are K2 CO2
required and which are optional?
Structure Query Language(SQL) is a database query language used for storing and
managing data in Relational DBMS.
Various parts of SQL are -
Data Definition Language(DDL): It consists of a set of commands for defining
relation schema, deleting relations, and modifying relation schemas.
• Data Manipulation Language(DML): It consists of set of SQL commands for
inserting tuples into relational schema, deleting tuples from or modifying tuples in
databases.
• Integrity: The SQL DDL includes commands for specifying integrity constraints.
These constraints must be satisfied by the databases.
• View definition: The SQL DDL contains the commands for defining views for
database.
• Transaction control: The SQL also includes the set of commands that indicate
beginning and ending of the transactions.
• Embedded SQL and Dynamic SQL: There is a facility of including SQL commands
in the programming languages like C,C++, COBOL or Java.
• Authorization: The SQL DDL includes the commands for specifying access rights to
relations and views.
Data Abstraction
The Basic data types used in SQL are -
(1) char(n): For representing the fixed length character string this data type is used. For
instance to represent name, designation, coursename, we use this data type. Instead of
char we can also use character. The n is specified by the user.
(2) varchar(n): The varchar means character varying. That means - for denoting the
variable length character strings this data type is used. The n is user specified maximum
character length.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 28
(3) int: For representing the numeric values without precision, the int data type is used.
(4) numeric: For representing, a fixed point number with user-specified precision this
data type is used. The number consists of m digits plus sign k digits are to the right of
precision. For instance the numeric(3,2) allows 333.11 but it does not allow 3333.11
(5) smallint: It is used to store small integer value. It allows machine dependent subset
of integer type.
(6) real: It allows the floating point, double precision numbers.
(7) float(n): For representing the floating point number with precision of at least n
digits this data type is used.
Basic Schema Definition
In this section, we will discuss various SQL commands for creating the schema
definition.
There are three categories of SQL commands.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 29
1. Creation
• A database can be considered as a container for tables and a table is a grid with rows
and columns to hold data.
• Individual statements in SQL are called queries.
• We can execute SQL queries for various tasks such as creation of tables, insertion of
data into the tables, deletion of record from table, and so on.
In this section we will discuss how to create a table.
Step 1: We normally create a database using following SQL statement..
Syntax
CREATE DATABASE database_name;
Example
CREATE DATABASE Person _DB
Step 2: The table can be created inside the database as follows -
CREATE TABLE table name (
Col1_name datatype,
col2 _name datatype,
……
coln_name datatype
);
Example
CREATE TABLE person_details{
AdharNo int,
FirstName VARCHAR(20),
MiddleName VARCHAR(20),
LastName VARCHAR(20),
Address VARCHAR(30),
City VARCHAR(10)
}
The blank table will be created with following structure
Person_details
2. Insertion
We can insert data into the table using INSERT statement.
Syntax
INSERT INTO table_name (col1, col2,...,coln)
VALUES (value1,value,...., valuen)
Example
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 30
INSERT INTO person_details (AdharNo, FirstName, MiddleName, LastName, Address,
City)
VALUES (111, 'AAA','BBB','CCC','M.G. Road', 'Pune')
The above query will result into –
3. Select
• The Select statement is used to fetch the data from the database table.
• The result returns the data in the form of table. These result tables are called resultsets.
• We can use the keyword DISTINCT. It is an optional keyword indicating that the
answer should not contain duplicates. Normally if we write the SQL without
DISTINCT operator then it does not eliminate the duplicates.
Syntax
SELECT col1, col2, ...,coln FROM table_name;
Example
SELECT AdharNo, FirstName, Address, City from person_details
The result of above query will be
• If we want to select all the records present in the table we make use of * character.
Syntax
SELECT FROM table_name;
Example
SELECT * FROM person_details;
The above query will result into
4. Where Clause
The WHERE command is used to specify some condition. Based on this condition the
data present in the table can be displayed or can be updated or deleted.
Syntax
SELECT col1,col2, ...,coln
FROM table_name
WHERE condition;
Example
Consider following table-
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 31
If we execute the following query
SELECT AdharNo
FROM person_details
WHERE city='Pune';
The result will be
If we want records of all those person who live in city Pune then we can write the query
using WHERE clause as
SELECT *
FROM person_details
WHERE city='Pune';
The result of above query will be
5. Update
• For modifying the existing record of a table, update query is used.
Syntax
UPDATE table name
SET col1-value1, col2-value2,...
WHERE condition;
Example
Consider following table
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 32
Person_details table
If we execute following query
UPDATE rerson_details
SET city 'Chennai'
WHERE AdharNo=333
The result will be
6. Deletion
We can delete one or more records based on some condition. The syntax is as follows -
Syntax
DELETE FROM table_name WHERE condition;
Example
DELETE FROM person_details
WHERE AdharNo=333
The result will be –
We can delete all the records from table. But in this deletion, all the records get deleted
without deleting table. For that purpose the SQL statement will be
DELETE FROM person_details;
7. Logical Operators
• Using WHERE clause we can use the operators such as AND, OR and NOT.
• AND operator displays the records if all the conditions that are separated using AND
operator are true.
• OR operator displays the records if any one of the condition separated using OR
operator is true.
• NOT operator displays a record if the condition is NOT TRUE.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 33
Consider following table
Syntax of AND
SELECT col1, col2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3...;
Example of AND
If we execute following query-
SELECT AdharNo, FirstName, City
FROM person_details
WHERE AdharNo=222 AND City= 'Pune';
The result will be –
Syntax of OR
SELECT col1, col2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
Example of OR
SELECT AdharNo, FirstName, City
FROM person_details
WHERE City='Pune' OR City='Mumbai';
The result will be –
Syntax of NOT
SELECT col1, col2, ...
FROM table_name
WHERE NOT condition;
Example of NOT
SELECT AdharNo, FirstName, City
FROM person_details
WHERE NOT City='Pune';
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 34
The result will be
8. Order By Clause
• Many times we need the records in the table to be in sorted order.
• If the records are arranged in increasing order of some column then it is called
ascending order.
• If the records are arranged in decreasing order of some column then it is called
descending order.
• For getting the sorted records in the table we use ORDER BY command.
• The ORDER BY keyword sorts the records in ascending order by default.
Syntax
SELECT col1, col2,...,coln
FROM table_name
ORDER BY col1,col2.... ASC | DESC
Here ASC is for ascending order display and DESC is for descending order display.
Example
Consider following table
SELECT *
FROM person_details
ORDER BY AdharNo DESC;
The above query will result in
9. Alteration
There are SQL command for alteration of table. That means we can add new column or
delete some column from the table using these alteration commands.z
Syntax for Adding columns
ALTER TABLE table_name
ADD column_name datatype;
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 35
Example
Consider following table
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 36
10. Defining Constraints
• We can specify rules for data in a table.
• When the table is created at that time we can define the constraints.
• The constraint can be column level i.e. we can impose constraint on the column and
table level i.e we can impose constraint on the entire table.
There are various types of constraints that can be defined are as follows -
1) Primary key: The primary key constraint is defined to uniquely identify the records
from the table.
The primary key must contain unique values. Hence database designer should choose
primary key very carefully.
For example
Consider that we have to create a person_details table with AdharNo, FirstName,
MiddleName, LastName, Address and City.
Now making AdharNo as a primary key is helpful here as using this field it becomes
easy to identify the records correctly.
The result will be
CREATE TABLE person_details (
AdharNo int,
FirstName VARCHAR(20),
MiddleName VARCHAR(20),
LastName VARCHAR(20),
Address VARCHAR(30),
City VARCHAR(10),
PRIMARY KEY(AdharNo)
);
We can create a composite key as a primary key using CONSTRAINT keyword. For
example
CREATE TABLE person_details (
AdharNo int NOT NULL,
FirstName VARCHAR(20),
MiddleName VARCHAR(20),
LastName VARCHAR(20) NOT NULL,
Address VARCHAR(30),
City VARCHAR(10),
CONSTRAINT PK_person_details PRIMARY KEY(AdharNo, LastName)
);
(2) Foreign Key
• Foreign key is used to link two tables.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 37
• Foreign key for one table is actually a primary key of another table.
• The table containing foreign key is called child table and the table containing
candidate primary key is called parent key.
• Consider
Employee Table
Dept Table:
• Notice that the "EmpID" column in the "Dept" table points to the "EmpID" column in
the "Employee" table.
• The "EmpID" column in the "Employee" table is the PRIMARY KEY in the
"Employee" table.
• The "EmpID" column in the "Dept" table is a FOREIGN KEY in the "Dept" table.
• The FOREIGN KEY constraint is used to prevent actions that would destroy links
between tables.
• The FOREIGN KEY constraint also prevents invalid data from being inserted into the
foreign key column, because it has to be one of the values contained in the table it
points to.
• The purpose of the foreign key constraint is to enforce referential integrity but there
are also performance benefits to be had by including them in your database design.
The table Dept can be created as follows with foreign key constraint.
CREATE TABLE DEPT (
DeptID int
DeptName VARCHAR(20),
EmpID int,
PRIMARY KEY(DeptID),
FOREIGN KEY (EmpID)
REFERENCES EMPLOYEE(EmpID)
);
(3) Unique
Unique constraint is used to prevent same values in a column. In the EMPLOYEE table,
for example, you might want to prevent two or more employees from having an
identical designation. Then in that case we must use unique constraint.
We can set the constraint as unique at the time of creation of table, or if the table is
already created and we want to add the unique constraint then we can use ALTER
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 38
command.
For example -
CREATE TABLE EMPLOYEE(
EmpID INT NOT NULL,
Name VARCHAR (20) NOT NULL,
Designation VARCHAR(20) NOT NULL UNIQUE,
Salary DECIMAL (12, 2),
PRIMARY KEY (EmpID)
);
If table is already created then also we can add the unique constraint as follows -
ALTER TABLE EMPLOYEE
MODIFY Designation VARCHAR(20) NOT NULL UNIQUE;
(4) NOT NULL
• By default the column can have NULL values.
• NULL means unknown values.
• We can set the column values as non NULL by using the constraint NOT NULL.
• For example
CREATE TABLE EMPLOYEE(
EmpID INT NOT NULL,
Name VARCHAR (20) NOT NULL,
Designation VARCHAR(20) NOT NULL,
Salary DECIMAL (12, 2) NOT NULL,
PRIMARY KEY (EmpID)
);
(5) CHECK
The CHECK constraint is used to limit the value range that can be placed in a column.
For example
CREATE TABLE parts (
Part_no int PRIMARY KEY,
Description VARCHAR(40),
Price DECIMAL(10, 2) NOT NULL CHECK(cost > 0)
);
(6) IN operator
The IN operator is just similar to OR operator.
It allows to specify multiple values in WHERE clause.
Syntax
SELECT col1,col2,....
FROM table_name
WHERE column-name IN (value1, value2,...);
Example
Consider following table
Employee
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 39
SELECT FROM Employee
WHERE empID IN (1, 3);
The result will be
Query: Find the names of students who have reserved the books with book isbn
Select Student.sname,Reserve.isbn
From Student, Reserve
Where Student.sid=Reserve.sid
Use of SQL Join
The SQL Joins clause is used to combine records from two or more tables in a database.
A JOIN is a means for combining fields from two tables by using values common to
each.
Example: Consider two tables for using the joins in SQL. Note that cid is common
column in following tables.
1) Inner Join:
• The most important and frequently used of the joins is the INNER JOIN. They are
also known as an EQUIJOIN.
• The INNER JOIN creates a new result table by combining column values of two
alqutul no tables (Table1 and Table2) based upon the join-predicate.
• The query compares each row of tablel with each row of Table2 to find all pairs of
rows which satisfy the join-predicate.
• When the join-predicate is satisfied, column values for each matched pair of rows of
A and B are combined into a result row. It can be represented as:
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 41
Syntax: The basic syntax of the INNER JOIN is as follows.
SELECT Table1.column1, Table2.column2...
FROM Table1
INNER JOIN Table2
ON Table1.common_field = Table2.common_field;
Example: For above given two tables namely Student and City, we can apply inner
join. It will return the record that are matching in both tables using the common column
cid. The query will be
SELECT *
FROM Student Inner Join City on Student.cid=City.cid
The result will be
2) Left Join:
• The SQL LEFT JOIN returns all rows from the left table, even if there are no matches
in the right table. This means that if the ON clause matches 0 (zero) records in the right
table; the join will still return a row in the result, but with NULL in each column from
the right table.
• This means that a left join returns all the values from the left table, plus matched
values from the right table or NULL in case of no matching join predicate.
• It can be represented as –
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 42
The result will be
3) Right Join:
• The SQL RIGHT JOIN returns all rows from the right table, even if there are no
matches in the left table.
• This means that if the ON clause matches 0 (zero) records in the left table; the join
will still return a row in the result, but with NULL in each column from the left table.
• This means that a right join returns all the values from the right table, plus matched
values from the left table or NULL in case of no matching join predicate.
• It can be represented as follows:
• Syntax: The basic syntax of a RIGHT JOIN is as follow-
4) Full Join:
• The SQL FULL JOIN combines the results of both left and right outer joins.
• The joined table will contain all records from both the tables and fill in NULLS for
missing matches on either side.
• It can be represented as
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 43
Syntax: The basic syntax of a FULL JOIN is as follows
SELECT Table1.column1, Table2.column2...
FROM Table1 FULL JOIN Table2 ON Table1.common_field = Table2.common_field;
The result will be -
Example: For above given two tables namely Student and City, we can apply Full join.
It will return returns rows when there is a match in one of the tables using the common
column cid. The query will be -
SELECT *
FROM Student Full Join City on Student.cid=City.cid
The result will be –
The result will be –
3 Demonstrate database normalization and explain about 1NF, 2NF and 3NF. K3 CO2
Normalization is the process of reorganizing data in a database so that it meets two
basic requirements:
1) There is no redundancy of data (all data is stored in only one place), and
2) data dependencies are logical (all related data items are stored together)
Need for normalization
1) It eliminates redundant data.
2) It reduces chances of data error.
3) The normalization is important because it allows database to take up less disk space.
4) It also help in increasing the performance.
5) It improves the data integrity and consistency.
First Normal Form
The table is said to be in 1NF if it follows following rules –
i) It should only have single (atomic) valued attributes/columns.
ii) Values stored in a column should be of the same domain
iii) All the columns in a table should have unique names.
iv) And the order in which data is stored, does not matter.
Consider following Student table
Student
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 44
As there are multiple values of phone number for sid 1 and 3, the above table is not in
1NF. We can make it in 1NF. The conversion is as follows -
This table is not in 2NF. For converting above table to 2NF we must follow the
following steps -
Step 1: The above table is in 1NF.
Step 2: Here sname and sid are associated similarly cid and cname are associated with
each other. Now if we delete a record with sid=2, then automatically the course C++
will also get deleted. Thus,
sid->sname or cid->cname is a partial functional dependency, because {sid,cid} should
be essentially a candidate key for above table. Hence to bring the above table to 2NF
we must decompose it as follows:
Student:
Course:
Superkeys
• {RegID}
• {RegID, RollNo}
• {RegID,Sname}
• {RollNo,Sname}
• {RegID, RollNo,Sname}
Candidate Keys
• {RegID}
• {RollNo}
Third Normal Form
A table is said to be in the Third Normal Form when,
i) It is in the Second Normal form.(i.e. it does not have partial functional dependency)
ii) It doesn't have transitive dependency.
Or in other words
In other words 3NF can be defined as: A table is in 3NF if it is in 2NF and for each
functional dependency
X-> Y
at least one of the following conditions hold :
i) X is a super key of table
ii) Y is a prime attribute of table
For example: Consider following table Student_details as follows -
Here
Super keys: {sid}, {sid,sname}, {sid,sname,zipcode}, {sid,zipcode,cityname}... and so
on.
Candidate keys:{sid}
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 47
Non-Prime attributes: {sname,zipcode,cityname,state}
The dependencies can be denoted as,
sid->sname
sid->zipcode
zipcode->cityname
cityname->state
The above denotes the transitive dependency. Hence above table is not in 3NF. We can
convert it into 3NF as follows:
Student
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 48
The table can be converted to 2NF as,
R1= (A, B, C)
R2= (A, D, E, I, J) smoot
R3= (B, F, G, H)
The above 2NF relations can be converted to 3NF as follows:
R1= (A, B, C)
R2= (A, D, E)
R1= (D, I, J)
R1= (B, E)
R3= (E, G, H).
Part-C ( 15 Marks)
S.No Questions BTL CO
Elaborate in detail about (i)Boyce / Codd Normal Form (BCNF)
1 K2 CO2
(ii) join dependency and the fifth normal form.
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form
deals with certain type of anomaly that is not handled by 3NF.
A 3NF table which does not have multiple overlapping candidate keys is said to be in
BCNF.
Or in other words,
For a table to be in BCNF, following conditions must be satisfied:
i) R must be in 3rd Normal Form
ii) For each functional dependency (X → Y), X should be a super Key. In simple words
if Y is a prime attribute then X can not be non prime attribute.
For example - Consider following table that represents that a Student enrollment for the
course -
Enrollment Table
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 49
that the teacher is not a superkey or in other words, teacher is a non prime attribute and
course is a prime attribute and non-prime attribute derives the prime attribute.
• To convert the above table to BCNF we must decompose above table into Student and
Course tables
Student
Course
• If the join of R1 and R2 over C is equal to relation R, then we can say that a Join
Dependency (JD) exists.
• The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 50
relation R.
• Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of
R.
ii) If we can decompose table further to eliminate redundancy and anomalies and when
we rejoin the table we should not be losing the original data or get a new record (join
Dependency Principle)
The fifth normal form is also called as project join normal form
Seller → {Company, Product). Hence table is not in 4 th Normal Form. To make the
above table in 4th normal form we decompose above table into two tables as
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 51
The above table is in 4th Normal Form as there is no multivalued dependency. But it is
not in 5th normal form because if we join the above two table we may get
To avoid the above problem we can decompose the tables into three tables as
Seller_Company, Seller_Product, and Company Product table
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 52
Thus the table in in 5th normal form.
UNIT III
TRANSACTION PROCESSING
Transaction Concepts – ACID Properties– Serializability – Concurrency Control –Need
for Concurrency – Locking Protocols- Time Stamp and Optimistic Concurrency
Control Algorithm – Deadlock- SQL Facilities for Concurrency and Recovery
Part-A (2 MARKS)
S.No Questions BTL CO
1 State ACID properties. K1 CO3
(Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee database
transactions are processed reliably. In the context of databases, a single logical operation
on the data is called a transaction. For example, a transfer of funds from one bank account
to another, even though that might involve multiple changes (such as debiting one account
and crediting another), is a single transaction.
2 Draw the states of Transaction. K4 CO3
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 53
3 Mention about deadlock prevention and deadlock avoidance. K1 CO3
Deadlock Prevention:
Preventing deadlocks by constraining how requests for resources can be made in
the system and how they are handled (system design).
The goal is to ensure that at least one of the necessary conditions for
deadlock can never hold.
Deadlock Avoidance:
The system dynamically considers every request and decides whether it is
safe to grant it at this point,
The system requires additional apriori information regarding the overall
poten- tial use of each resource for each process.
Allows more concurrency.
4 Outline Concurrency Control. K1 CO3
Part-B ( 13 Marks)
S.No Questions BTL CO
1 Discuss about ACID properties with suitable example. K2 CO3
1) Atomicity:
This property states that each transaction must be considered as a single unit and
must be completed fully or not completed at all.
• No transaction in the database is left half completed.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 54
• Database should be in a state either before the transaction execution or after the
transaction execution. It should not be in a state 'executing'.
• For example In above mentioned withdrawal of money transaction all the five
steps must be completed fully or none of the step is completed. Suppose if
transaction gets failed after step 3, then the customer will get the money but the
balance will not be updated accordingly. The state of database should be either at
before ATM withdrawal (i.e customer without withdrawn money) or after ATM
withdrawal (i.e. customer with money and account updated). This will make the
system in consistent state.
2) Consistency:
• The database must remain in consistent state after performing any transaction.
• For example: In ATM withdrawal operation, the balance must be updated
appropriately after performing transaction. Thus the database can be in consistent
state.
3) Isolation:
• In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the only transaction in the
system.
• No transaction will affect the existence of any other transaction.
• For example: If a bank manager is checking the account balance of particular
customer, then manager should see the balance either before withdrawing the money
or after withdrawing the money. This will make sure that each individual transaction
is completed and any other dependent transaction will get the consistent data out of
it. Any failure to any transaction will not affect other transaction in this case. Hence
it makes all the transactions consistent.
4) Durability:
• The database should be strong enough to handle any system failure.
• If there is any set of insert /update, then it should be able to handle and commit to
the database.
• If there is any failure, the database should be able to recover it to the consistent
state.
• For example: In ATM withdrawal example, if the system failure happens after
Customer getting the money then the system should be strong enough to update
Database with his new balance, after system recovers. For that purpose the system
has to keep the log of each transaction and its failure. So when the system recovers,
it should be able to know when a system has failed and if there is any pending
transaction, then it should be updated to Database.
2 Illustrate the concepts of Concurrency control mechanism. K2 CO3
The purposes of concurrency control -
• To ensure isolation
• To resolve read-write or write-write conflicts
• To preserve consistency of database
• Concurrent execution of transactions over shared database creates several data
integrity and consistency problems - these are
(1) Lost update problem: This problem occurs when two transactions that access
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 55
the same database items have their operations interleaved in a way that makes the
value of some database item incorrect.
For example - Consider following transactions
(1) Salary of Employee is read during transaction T1.
(2) Salary of Employee is read by another transaction T2.
The result of the above sequence is that the update made by transaction T1 is
completely lost. Therefor this problem is called as lost update problem.
(2) Dirty read or Uncommited read problem: The dirty read is a situation in
which one transaction reads the data immediately after the write operation of
previous transaction
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 56
(1) At the time t1, the transaction T2 updates the salary to 1200
(2) This salary is read at time t2 by transaction T1. Obviously it is 1200
(3) But at the time t3, the transaction T2 performs Rollback by undoing the changes
made by T1 and T2 at time t1 and t2.
(4) Thus the salary again becomes = 1000. This situation leads to Dirty Read or
Uncommited Read because here the read made at time t2(immediately after roid
update of another transaction) becomes a dirty read.
(3) Non-repeatable read problem
This problem is also known as inconsistent analysis problem. This problem occurs
when a particular transaction sees two different values for the same row within its
lifetime. For example-
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 57
(1) At time t1, the transaction T1 reads the value of salary as 1000
(2) At time t2, the transaction T2 reads the value of the same salary as 1000
(3) At time t3, the transaction T1 deletes the variable salary.
(4) Now at time t4, when T2 again reads the salary it gets error. Now transaction
T2 can not identify the reason why it is not getting the salary value which is read just
few time back.
This problem occurs due to changes in the database and is called phantom read
problem.
3 Describe briefly about deadlock handling. K2 CO3
The two phase locking is a protocol in which there are two phases:
i) Growing phase (Locking phase): It is a phase in which the transaction may
obtain locks but does not release any lock.
ii) Shrinking phase (Unlocking phase): It is a phase in which the transaction may
release the locks but does not obtain any new lock.
• Lock Point: The last lock position or first unlock position is called lock point. For
example
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 58
The important rule for being a two phase locking is - All Lock operations precede
all the unlock operations.
In above transactions T1 is in two phase locking mode but transaction T2 is not in
two phase locking. Because in T2, the Shared lock is acquired by data item B, then
data item B is read and then the lock is released. Again the lock is acquired by data
item A, then the data item A is read and the lock is then reloased. Thus we get lock-
unlock-lock-unlock sequence. Clearly this is not possible in two phase locking.
Example :-Prove that two phase locking guarantees serializability.
Solution:
• Serializability is mainly an issue of handling write operation. Because any
inconsistency may only be created by write operation.
• Multiple reads on a database item can happen parallely.
• 2-Phase locking protocol restricts this unwanted read/write by applying exclusive
lock.
• Moreover, when there is an exclusive lock on an item it will only be released in
shrinking phase. Due to this restriction there is no chance of getting any inconsistent
state.
The serializability using two phase locking can be understood with the help of
following example
Consider two transactions
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 59
Step 1: Now we will apply two phase locking. That means we will apply locks in
growing and shrinking phase
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 60
(2) Cascading Rollback: Cascading rollback is a situation in which transaction
failure leads to a series of transaction rollback. For example -
When T1 writes value of C then only T2 can read it. And when T2 writes the value of
C then only transaction T3 can read it. But if the transaction T1 gets failed then
automatically transactions T2 and T3 gets failed.
The simple two phase locking does not solve the cascading rollback problem. To
solve the problem of cascading Rollback two types of two phase locking
mechanisms can be used.
Types of Two Phase Locking
(1) Strict two phase locking: The strict 2PL protocol is a basic two phase protocol
but all the exclusive mode locks be held until the transaction commits. That means
in other words all the exclusive locks are unlocked only after the transaction is
committed. That also means that if T1 has exclusive lock, then T, will release the
exclusive lock only after commit operation, then only other transaction is allowed to
read or write. For example Consider two transactions
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 61
If we apply the locks then
Thus only after commit operation in T1, we can unlock the exclusive lock. This
ensures the strict serializability.
Thus compared to basic two phase locking protocol, the advantage of strict 2PL
protocol is it ensures strict serializability.
(2) Rigorous two phase locking: This is stricter two phase locking protocol. Here
all locks are to be held until the transaction commits. The transactions can be
seriealized in the order in which they commit.
example - Consider transactions
Thus the above transaction uses rigorous two phase locking mechanism.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 62
Example :-Consider the following two transactions:
T1:read(A)
Read(B);
If A=0 then B=B+1;
Write(B)
T2:read(B); read(A)
If B=0 then A=A+1
Write(A)
This is lock-unlock instruction sequence help to satisfy the requirements for strict
two phase locking for the given transactions.
The execution of these transactions result in deadlock. Consider following partial
execution scenario which leads to deadlock.
Lock Conversion
Lock conversion is a mechanism in two phase locking mechanism - which allows
conversion of shared lock to exclusive lock or exclusive lock to shared lock.
Method of Conversion :
First Phase:
• can acquire a lock-S on item
• can acquire a lock-X on item
• can convert a lock-S to a lock-X (upgrade)
Second Phase:
• can release a lock-S
• can release a lock-X
• can convert a lock-X to a lock-S (downgrade)
This protocol assures serializability. But still relies on the programmer to insert the
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 63
various locking instructions.
For example - Consider following two transactions -
Here if we start applying locks, then we must apply the exclusive lock on data item
A, because we have to read as well as write on data item A. Another transaction
T2 does not get shared lock on A until transaction T1 performs write operation on A.
Since transaction T1 needs exclusive lock only at the end when it performs write
operation on A, it is better if T1 could initially lock A in shared mode and then later
change it to exclusive mode lock when it performs write operation. In such situation,
the lock conversion mechanism becomes useful.
Part-C ( 15 Marks)
S.No Questions BTL CO
1 Describe briefly about Serializability and its types with relevant example. K2 CO3
When multiple transactions run concurrently, then it may lead to inconsistency of data
(i.e. change in the resultant value of data from different transaction).
Serializability is a concept that helps to identify which non serial schedule and find
the transaction equivalent to serial schedule.
For example:
In above transaction initially T1 will read the values from database as A= 100, B= 100
and modify the values of A and B, transaction T 2 will read the modified value i.e. 90
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 64
and will modify it to 80 and perform write operation. Thus at the end of transaction
T1 value of A will be 90 but at end of transaction T 2 value of A will be 80. Thus
conflicts or inconsistency occurs here. This sequence can be converted to a sequence
which may give us consistent result. This process is called serializability.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 65
• There are two types of serializabilities: conflict serializability and view serializability
• Predence graph is a directed graph, consisting of G=(V,E) where V is set of vertices
and E is set of edges. The set of vertices consists of all the transactions participating in
the schedule. The set of edges consists of all edges Ti→Tj for which one of three
conditions holds :
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
• A serializability order of the transactions can be obtained by finding a linear order
consistent with the partial order of the precedence graph. This process is called
topological sorting.
Testing for serializability
Following method is used for testing the serializability: To test the
conflictserializability we can draw a graph G = (V,E) where V = vertices which
represent the number of transactions. E = edges for conflicting pairs.
Step 1: Create a node for each transaction.
Step 2: Find the conflicting pairs(RW, WR, WW) on the same variable(or data item)
by different transactions.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 66
Step 3: Draw edge for the given schedule. Consider following cases
1. Ti executes write(Q) before Tj executes read(Q), then draw edge from Ti to Tj.
2. Ti executes read(Q) before Tj executes write(Q), then draw edge from Ti to Tj
3. Ti executes write(Q) before Tj executes write(Q),, then draw edge from Ti to Tj
Step 4: Now, if precedence graph is cyclic then it is a non conflict serializable
schedule and if the precedence graph is acyclic then it is conflict serializable schedule.
Example:
Consider the following two transactions and schedule (time goes from top to bottom).
Is this schedule conflict-serializable
Solution :
Step 1: To check whether the schedule is conflict serializable or not we will check
from top to bottom. Thus we will start reading from top to bottom as
T1: R(A) -> T1:W(A) ->T2:R(A) -> T2:R(B) ->T1:R(B)->T1:W(B)
Step 2: We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them-
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
From above given example in the top to bottom scanning we find the conflict as
T1:W(A)->T2:R(A).
i) Here note that there are two different transactions T1 and T2,
ii) Both work on same data item i.e. A and
iii) One of the operation is write operation.
Step 3: We will build a precedence graph by drawing one node from each transaction.
In above given scenario as there are two transactions, there will be two nodes namely
T1 and T2
Step 4: Draw the edge between conflicting transactions. For example in above given
scenario, the conflict occurs while moving from T 1:W(A) to T2:R(A). Hence edge
must be from T1 to T2.
Step 5: Repeat the step 4 while reading from top to bottom. Finally the precedence
graph will be as follows
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 67
Step 6: Check if any cycle exists in the graph. Cycle is a path using which we can start
from one node and reach to the same node. If the is cycle found then schedule is not
conflict serializable. In the step 5 we get a graph with cycle, that means given schedule
is not conflict serializable.
UNIT IV
FILES AND IMPLEMENTATION TECHNIQUES
File operations and Organization –RAID – Indexing and Hashing– B+ tree Index Files –
B tree Index Files – Static Hashing – Dynamic Hashing – Query optimization using
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 68
Heuristics and Cost Estimation.
Part-A (2 MARKS)
S.No Questions BTL CO
1 Differentiate Static hashing and Dynamic hashing. K2C CO4
S.No Static hashing Dynamic hashing
1. Numbers of buckets are fixed Numbers of buckets are not fixed
2. As the file grows, performance de As the file grows, performances do
creases. Not degrade.
3. Space overhead is more Space overhead is less
4. Do not use bucket address table Bucket address table is used.
5. Open hashing and closed hashing Extendable hashing and linear
are forms of it. hashing are forms of it.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 69
ordered in a sequential linked using linked list.
list.
2. B +tree store redundant search B tree store non-redundant search
key. key.
3. B+-tree data store only leaf nodes. B tree search keys and data stored in
internal or leaf nodes.
4. Insertion of a B+ tree is Insertion of a B tree is more
not complicated. complicated.
5 State query execution plan. K1 CO4
The query execution component applies the physical plan generated by the query
optimizer. That plan should ensure a minimal memory footprint and I/O
operations, as well as a fast execution time. Some optimization can be
implemented by this component.
Part-B ( 13 Marks)
S.No Questions BTL CO
1 Summarize the concept of RAID. K2 CO4
RAID
• RAID stands for Redundant Array of Independent Disks. This is a technology in
which multiple secondary disks are connected together to increase the performance,
data redundancy or both.
• For achieving the data redundancy - in case of disk failure, if the same data is also
backed up onto another disk, we can retrieve the data and go on with the operation.
It consists of an array of disks in which multiple disks are connected to achieve
different goals.
• The main advantage of RAID, is the fact that, to the operating system the array of
disks can be presented as a single disk.
Need for RAID
• RAID is a technology that is used to increase the performance.
• It is used for increased reliability of data storage.
• An array of multiple disks accessed in parallel will give greater throughput than a
single disk.
• With multiple disks and a suitable redundancy scheme, your system can stay up
and running when a disk fails, and even while the replacement disk is being installed
and its data restored.
Features
(1) RAID is a technology that contains the set of physical disk drives.
(2) In this technology, the operating system views the separate disks as a single
logical disk.
(3) The data is distributed across the physical drives of the array.
(4) In case of disk failure, the parity information can be helped to recover the data.
RAID Levels
Level: RAID 0
• In this level, data is broken down into blocks and these blocks are stored across all
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 70
the disks.
• Thus striped array of disks is implemented in this level. For instance in the
following figure blocks "A B" form a stripe.
• There is no duplication of data in this level so once a block is lost then there is no
int lovol diri way recover it.
• The main priority of this level is performance and not the reliability.
Level: RAID 1
• This level makes use of mirroring. That means all data in the drive is duplicated to
another drive.
• This level provides 100% redundancy in case of failure.
• Only half space of the drive is used to store the data. The other half of drive is just
a mirror to the already stored data.
• The main advantage of this level is fault tolerance. If some disk fails then the other
automatically takes care of lost data.
Level: RAID 2
• This level makes use of mirroring as well as stores Error Correcting Codes (ECC)
for its data striped on different disks.
• The data is stored in separate set of disks and ECC is stored another set of disks.
• This level has a complex structure and high cost. Hence it is not used
commercially.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 71
Level: RAID 3
• This level consists of byte-level stripping with dedicated parity. In this level, the
parity information is stored for each disk section and written to a dedicated. parity
drive.
• We can detect single errors with a parity bit. Parity is a technique that checks
whether data has been lost or written over when it is moved from one place in
storage to another.
• In case of disk failure, the parity disk is accessed and data is reconstructed from
the remaining devices. Once the failed disk is replaced, the missing data can be
restored on the new disk.
Level: RAID 4
• RAID 4 consists of block-level stripping with a parity disk.
• Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 72
Level: RAID 5
• RAID 5 is a modification of RAID 4.
• RAID 5 writes whole data blocks onto different disks, but the parity bits generated
for data block stripe are distributed among all the data disks rather than storing them
on a different dedicated disk.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 73
Level: RAID 6
• RAID 6 is a extension of Level 5
• RAID 6 writes whole data blocks onto different disks, but the two independent
parity bits generated for data block stripe are distributed among all the data disks
rather than storing them on a different dedicated disk.
• Two parities provide additional fault tolerance.
• This level requires at least four disks to implement RAID.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 74
Monetary cost of extra disk-storage requirements.
1. Performance requirements in terms of number of I/O operations.
2. Performance when a disk has failed.
3. Performance during rebuild
2 Illustrate indexing and hashing techniques with suitable examples. K2 CO4
Ordered Indices
Primary and Clustered Indices
Primary index :
• An index on a set of fields that includes the primary key is called a primary index.
The primary index file should be always in sorted order.
• The primary indexing is always done when the data file is arranged in sorted order
and primary indexing contains the primary key as its search key.
• Consider following scenario in which the primary index consists of few entries as
compared to actual data file.
Once if you are able to locate the first entry of the record containing block, other
entries are stored continuously. For example if we want to search a record for Reg
No 11AS32 we need not have to search for the entire data file. With the help of
primary index structure we come to know the location of the record containing the
RegNo 11AS30, now when the first entry of block 30 is located, then we can easily
no rig locate the entry for 11AS32.
We can apply binary search technique. Suppose there are n = 300 blocks in a main
data file then the number of accesses required to search the data file will be log 2n+ 1
= (log2 300) + 1≈9
If we use primary index file which contains at the most n = 3 blocks then using
binary search technique, the number of accesses required to search using the
primary index file will be log2 n+1= (log2 3)+1=3
• This shows that using primary index the access time can be deduced to great
extent.
Clustered index:
• In some cases, the index is created on non-primary key columns which may not be
unique for each record. In such cases, in order to identify the records faster, we will
group two or more columns together to get the unique values and create index out of
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 75
them. This method is known as clustering index.
• When a file is organized so that the ordering of data records is the same as the
ordering of data entries in some index then say that index is clustered, otherwise it is
an unclustered index.
• Note that, the data file need to be in sorted order.
• Basically, records with similar characteristics are grouped together and indexes are
created for these groups.
• For example, students studying in each semester are grouped together. i.e.; 1st
semester students, 2nd semester students, 3rd semester students etc. are grouped.
2) Sparse index:
• Index records are created only for some of the records.
• To locate a record, we find the index record with the largest search key value less
than or equal to the search key value we are looking for.
• We start at that record pointed to by the index record, and proceed along the
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 76
pointers in the file (that is, sequentially) until we find the desired record.
• For example -
• The index file usually occupies considerably less disk blocks than the data file
because its entries are much smaller.
• A binary search on the index yields a pointer to the file record.
• The types of single level indexing can be primary indexing, clustering index or
secondary indexing.
• Example: epresents the single level indexing –
Multilevel indexing:
• There is an immense need to keep the index records in the main memory so as to
speed up the search operations. If single-level index is used, then a large size index
cannot be kept in memory which leads to multiple disk accesses.
• Multi-level Index helps in breaking down the index into several smaller indices in
order to make the outermost level so small that it can be saved in a single disk block,
which can easily be accommodated anywhere in the main memory.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 77
Secondary Indices
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 78
• In this technique two levels of indexing are used in order to reduce the mapping
size of the first level and in general.
• Initially, for the first level, a large range of numbers is selected so that the mapping
size is small. Further, each range is divided into further sub ranges.
• It is used to optimize the query. processing and access records in a database with
some information other than the usual search key.
For example -
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 79
• The B+ tree is called dynamic tree because the tree structure can grow on insertion
of records and shrink on deletion of records.
Characteristics of B+ Tree
Following are the characteristics of B+ tree.
1) The B+ tree is a balanced tree and the operations insertions and deletion keeps the
tree balanced.
2) A minimum occupancy of 50 percent is guaranteed for each node except the root.
3) Searching for a record requires just traversal from the root to appropriate leaf.
Insertion Operation
Algorithm for insertion :
Step 1: Find correct leaf L.
Step 2: Put data entry onto L.
i) If L has enough space, done!
ii) Else, must split L (into L and a new node L2)
• Allocate new node
• Redistribute entries evenly
• Copy up middle key.
• Insert index entry pointing to L2 into parent of L.
Step 3: This can happen recursively
i) To split index node, redistribute entries evenly, but push up middle key. (Contrast
with leaf splits.)
Step 4: Splits "grow" tree; root split increases height.
i) Tree growth: gets wider or one level taller at top.
Construct B+ tree for following data. 30,31,23,32,22,28,24,29, where number of
pointers that fit in one node are 5.
Solution: In B+ tree each node is allowed to have the number of pointers to be 5.
That means at the most 4 key values are allowed in each node.
Step 1: Insert 30,31,23,32. We insert the key values in ascending order.
Step 2: Now if we insert 22, the sequence will be 22, 23, 30, 31, 32. The middle key
30, will go up.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 80
Step 3: Insert 28,24. The insertion is in ascending order.
Step 4: Insert 29. The sequence becomes 22, 23, 24, 28, 29. The middle key 24 will
go up. Thus we get the B+ tree.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 81
Merits of B+ Index Tree Structure
1. In B+ tree the data is stored in leaf node so searching of any data requires
scanning only of leaf node alone.
2. Data is ordered in linked list.
3. Any record can be fetched in equal number of disk accesses.
4. Range queries can be performed easily as leaves are linked up.
5. Height of the tree is less as only keys are used for indexing.
6. Supports both random and sequential access.
Demerits of B+ Index Tree Structure
1. Extra insertion of non leaf nodes.
2. There is space overhead.
Part-C ( 15 Marks)
S.No Questions BTL CO
1 Describe query optimization with an example. K2 CO4
Query processing is a collection of activities that are involved in extracting data from
database.
• During query processing there is translation high level database language queries
into the expressions that can be used at the physical level of file system.
• There are three basic steps involved in query processing and those are -
1. Parsing and Translation
• In this step the query is translated into its internal form and then into relational
algebra.
• Parser checks syntax and verifies relations.
• For instance - If we submit the query as,
SELECT RollNo, name
FROM Student
HAVING RollNo=10
Then it will issue a syntactical error message as the correct query should be
SELECT RollNo, name
FROM Student
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 82
HAVING RollNo=10
Thus during this step the syntax of the query is checked so that only correct and
verified query can be submitted for further processing.
2. Optimization
• During this process thequery evaluation plan is prepared from all the relational
algebraic expressions. bud off
• The query cost for all the evaluation plans is calculated.
• Amongst all equivalent evaluation plans the one with lowest cost is chosen.
• Cost is estimated using statistical information from the database catalog, such asthe
number of tuples in each relation, size of tuples, etc.
3. Evaluation
• The query-execution engine takes a query-evaluation plan, executes that plan, and
returns the answers to the query.
Out of the above given query evaluation plans, (b) is much faster than (a) because - in
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 84
(a) the join operation is among Branch, Account and Customer, whereas in (b) the join
of (Account and Customer) is made with the selected tuple for City="Pune". Thus the
output of entire table for join operation is much more than the join for some selected
tuples. Thus we get choose the optimized query.
Cost based Estimation
• A cost based optimizer will look at all of the possible ways or scenarios in which a
query can be executed.
• Each scenario will be assigned a 'cost', which indicates how efficiently that query can
be run.
• Then, the cost based optimizer will pick the scenario that has the least cost and
execute the query using that scenario, because that is the most efficient way to run the
query.
• Scope of query optimization is a query block. Global query optimization involves
multiple query blocks.
• Cost components for query execution
• Access cost to secondary storage
• Disk storage cost
• Computation cost
• Memory usage cost
• Communication cost
• Following information stored in DBMS catalog and used by optimizer
• File size
• Organization
• Number of levels of each multilevel index
• Number of distinct values of an attribute
• Attribute selectivity
• RDBMS stores histograms for most important attributes
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 85
UNIT V
SPECIAL DATABASES
Distributed Databases: Architecture, Transaction Processing – Object-based Databases:
Object Database Concepts, Object-Relational features, ODMG Object Model, ODL,
OQL - XML Hierarchical Model, XML Schema.
Part-A (2 MARKS)
S.No Questions BTL CO
1 Define a distributed database management system. K1C CO5
A new software component at each site logically an extension of the local DBMS
provides the necessary partnership functionality, and it is the combination of these
new components together with the existing DBMSs that constitutes what is usually
called the distributed database management system.
2 K1 CO5
State OODBMS .
Object-oriented database management systems (OODBMSs) combine database
capabilities with object-oriented programming language capabilities. OODBMSs
allow object-oriented programmers to develop the product, store them as objects, and
replicate or modify existing objects to make new objects within the OODBMS.
Because the database is integrated with the programming language, the programmer
can maintain consistency within one environment, in that both the OODBMS and the
programming language will use the same model of representation.
3 What is meant by XML Database? K1 CO5
An XML database is a data persistence software system that allows data to be
specified, and sometimes stored, in XML format. These data can then be queried,
transformed, exported and returned to a calling system.
4 K2 CO5
Compare between object-oriented and object-relational databases.
Feature Object Databases Object-Relational Databases
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 86
complex data DBs
5 Outline XML hierarchical data model. K1 CO5
XML documents have a hierarchical structure and can conceptually be interpreted as a
tree structure, called an XML tree. XML documents must contain a root element (one
that is the parent of all other elements). All elements in an XML document can
contain sub elements, text and attributes.
Part-B ( 13 Marks)
S.No Questions BTL CO
1 Explain in detail about ODMG Object Model. K2 CO5
The ODBMS which is an abbreviation for object-oriented database management
system is the data model in which data is stored in form of objects, which are
instances of classes. These classes and objects together make an object-oriented data
model.
Components of Object-Oriented Data Model:
The OODBMS is based on three major components, namely: Object structure,
Object classes, and Object identity. These are explained below.
1. Object Structure:
The structure of an object refers to the properties that an object is made up of. These
properties of an object are referred to as an attribute. Thus, an object is a real-world
entity with certain attributes that makes up the object structure. Also, an object
encapsulates the data code into a single unit which in turn provides data abstraction
by hiding the implementation details from the user.
The object structure is further composed of three types of components: Messages,
Methods, and Variables. These are explained below.
1. Messages –
A message provides an interface or acts as a communication medium
between an object and the outside world. A message can be of two types:
Read-only message: If the invoked method does not change the
value of a variable, then the invoking message is said to be a read-
only message.
Update message: If the invoked method changes the value of a
variable, then the invoking message is said to be an update message.
2. Methods –
When a message is passed then the body of code that is executed is known as
a method. Whenever a method is executed, it returns a value as output. A
method can be of two types:
Read-only method: When the value of a variable is not affected by a
method, then it is known as the read-only method.
Update-method: When the value of a variable change by a method,
then it is known as an update method.
3. Variables –
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 87
It stores the data of an object. The data stored in the variables makes the
object distinguishable from one another.
2.Object Classes:
An object which is a real-world entity is an instance of a class. Hence first we need
to define a class and then the objects are made which differ in the values they store
but share the same class definition. The objects in turn correspond to various
messages and variables stored in them.
Example –
class CLERK
{ //variables
char name;
string address;
int id;
int salary;
//Messages
char get_name();
string get_address();
int annual_salary();
};
In the above example, we can see, CLERK is a class that holds the object variables
and messages.
An OODBMS also supports inheritance in an extensive manner as in a database
there may be many classes with similar methods, variables and messages. Thus, the
concept of the class hierarchy is maintained to depict the similarities among various
classes.
The concept of encapsulation that is the data or information hiding is also supported
by an object-oriented data model. And this data model also provides the facility of
abstract data types apart from the built-in data types like char, int, float. ADT’s are
the user-defined data types that hold the values within them and can also have
methods attached to them.
Thus, OODBMS provides numerous facilities to its users, both built-in and user-
defined. It incorporates the properties of an object-oriented data model with a
database management system, and supports the concept of programming paradigms
like classes and objects along with the support for other concepts like encapsulation,
inheritance, and the user-defined ADT’s (abstract data types).
ODBMS stands for Object-Oriented Database Management System, which is a type
of database management system that is designed to store and manage object-
oriented data. Object-oriented data is data that is represented using objects, which
encapsulate data and behavior into a single entity.
An ODBMS stores and manages data as objects, and provides mechanisms for
querying, manipulating, and retrieving the data. In an ODBMS, the data is typically
stored in the form of classes and objects, which can be related to each other using
inheritance and association relationships.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 88
In an ODBMS, the data is managed using an object-oriented programming language
or a specialized query language designed for object-oriented databases. Some of the
popular object-oriented database languages include Smalltalk, Java, and C++. Some
ODBMS also support standard SQL for querying the data.
ODBMS have several advantages over traditional relational databases. One of the
main advantages is that they provide a natural way to represent complex data
structures and relationships. Since the data is represented using objects, it can be
easier to model real-world entities in the database. Additionally, ODBMS can
provide better performance and scalability for applications that require a large
number of small, complex transactions.
However, there are also some disadvantages to using an ODBMS. One of the main
disadvantages is that they can be more complex and harder to use than traditional
relational databases. Additionally, ODBMS may not be as widely used and
supported as traditional relational databases, which can make it harder to find
expertise and support. Finally, some applications may not require the advanced
features and performance provided by an ODBMS, and may be better suited for a
simpler database solution
Features of ODBMS:
Object-oriented data model: ODBMS uses an object-oriented data model to store
and manage data. This allows developers to work with data in a more natural way,
as objects are similar to the objects in the programming language they are using.
Complex data types: ODBMS supports complex data types such as arrays, lists,
sets, and graphs, allowing developers to store and manage complex data structures
in the database.
Automatic schema management: ODBMS automatically manages the schema of
the database, as the schema is defined by the classes and objects in the application
code. This eliminates the need for a separate schema definition language and
simplifies the development process.
High performance: ODBMS can provide high performance, especially for
applications that require complex data access patterns, as objects can be retrieved
with a single query.
Data integrity: ODBMS provides strong data integrity, as the relationships between
objects are maintained by the database. This ensures that data remains consistent
and correct, even in complex applications.
Concurrency control: ODBMS provides concurrency control mechanisms that
ensure that multiple users can access and modify the same data without conflicts.
Scalability: ODBMS can scale horizontally by adding more servers to the database
cluster, allowing it to handle large volumes of data.
Support for transactions: ODBMS supports transactions, which ensure that
multiple operations on the database are atomic and consistent.
Advantages:
Supports Complex Data Structures: ODBMS is designed to handle complex data
structures, such as inheritance, polymorphism, and encapsulation. This makes it
easier to work with complex data models in an object-oriented programming
environment.
Improved Performance: ODBMS provides improved performance compared to
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 89
traditional relational databases for complex data models. ODBMS can reduce the
amount of mapping and translation required between the programming language and
the database, which can improve performance.
Reduced Development Time: ODBMS can reduce development time since it
eliminates the need to map objects to tables and allows developers to work directly
with objects in the database.
Supports Rich Data Types: ODBMS supports rich data types, such as audio,
video, images, and spatial data, which can be challenging to store and retrieve in
traditional relational databases.
Scalability: ODBMS can scale horizontally and vertically, which means it can
handle larger volumes of data and can support more users.
Disadvantages:
Limited Adoption: ODBMS is not as widely adopted as traditional relational
databases, which means it may be more challenging to find developers with
experience working with ODBMS.
Lack of Standardization: ODBMS lacks standardization, which means that
different vendors may implement different features and functionality.
Cost: ODBMS can be more expensive than traditional relational databases since it
requires specialized software and hardware.
Integration with Other Systems: ODBMS can be challenging to integrate with
other systems, such as business intelligence tools and reporting software.
Scalability Challenges: ODBMS may face scalability challenges due to the
complexity of the data models it supports, which can make it challenging to partition
data across multiple nodes.
2 Describe about OODBMS and XML Database. K2 CO5
Object Oriented Databases
Object oriented databases are also called Object Database Management Systems
(ODBMS). Object databases store objects rather than data such as integers, strings
or real numbers. Objects are used in object oriented languages such as Smalltalk,
C++, Java, and others. Objects basically consist of the following:
Attributes - Attributes are data which defines the characteristics of an
object. This data may be simple such as integers, strings, and real numbers or it
may be a reference to a complex object.
Methods - Methods define the behavior of an object and are what was
formally called procedures or functions.
Therefore objects contain both executable code and data. There are other
characteristics of objects such as whether methods or data can be accessed from
outside the object. We don't consider this here, to keep the definition simple and
to apply it to what an object database is. One other term worth mentioning is
classes. Classes are used in object oriented programming to define the data and
methods the object will contain. The class is like a template to the object. The
class does not itself contain data or methods but defines the data and methods
contained in the object. The class is used to create (instantiate) the object. Classes
may be used in object databases to recreate parts of the object that may not
actually be stored in the database. Methods may not be stored in the database and
may be recreated by using a class.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 90
Comparison to Relational Databases
Relational databases store data in tables that are two dimensional. The tables have
rows and columns. Relational database tables are "normalized" so data is not
repeated more often than necessary. All table columns depend on a primary key (a
unique value in the column) to identify the column. Once the specific column is
identified, data from one or more rows associated with that column may be
obtained or changed.
To put objects into relational databases, they must be described in terms of simple
string, integer, or real number data. For instance in the case of an airplane. The
wing may be placed in one table with rows and columns describing its dimensions
and characteristics. The fusalage may be in another table, the propeller in another
table, tires, and so on.
Breaking complex information out into simple data takes time and is labor
intensive. Code must be written to accomplish this task.
Object Persistence
With traditional databases, data manipulated by the application is transient and
data in the database is persisted (Stored on a permanent storage device). In object
databases, the application can manipulate both transient and persisted data.
When to Use Object Databases
Object databases should be used when there is complex data and/or complex data
relationships. This includes a many to many object relationship. Object databases
should not be used when there would be few join tables and there are large
volumes of simple transactional data.
Object databases work well with:
CAS Applications (CASE-computer aided software engineering, CAD-
computer aided design, CAM-computer aided manufacture)
Multimedia Applications
Object projects that change over time.
Commerce
Object Database Advantages over RDBMS
Objects don't require assembly and disassembly saving coding time and
execution time to assemble or disassemble objects.
Reduced paging
Easier navigation
Better concurrency control - A hierarchy of objects may be locked.
Data model is based on the real world.
Works well for distributed architectures.
Less code required when applications are object oriented.
Object Database Disadvantages compared to RDBMS
Lower efficiency when data is simple and relationships are simple.
Relational tables are simpler.
Late binding may slow access speed.
More user tools exist for RDBMS.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 91
Standards for RDBMS are more stable.
Support for RDBMS is more certain and change is less likely to be required.
ODBMS Standards
Object Data Management Group
Object Database Standard ODM6.2.0
Object Query Language
OQL support of SQL92
How Data is Stored
Two basic methods are used to store objects by different database vendors.
Each object has a unique ID and is defined as a subclass of a base class,
using inheritance to determine attributes.
Virtual memory mapping is used for object storage and management.
Data transfers are either done on a per object basis or on a per page (normally 4K)
basis.
XML Database
is used to store huge amount of information in the XML format. As the use of XML
is increasing in every field, it is required to have a secured place to store the XML
documents. The data stored in the database can be queried using XQuery,
serialized, and exported into a desired format.
XML Database Types
There are two major types of XML databases −
XML- enabled
Native XML (NXD)
XML - Enabled Database
XML enabled database is nothing but the extension provided for the conversion of
XML document. This is a relational database, where data is stored in tables
consisting of rows and columns. The tables contain set of records, which in turn
consist of fields.
Native XML Database
Native XML database is based on the container rather than table format. It can store
large amount of XML document and data. Native XML database is queried by
the XPath-expressions.
Native XML database has an advantage over the XML-enabled database. It is highly
capable to store, query and maintain the XML document than XML-enabled
database.
Example
Following example demonstrates XML database −
<?xml version = "1.0"?>
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 92
<contact-info>
<contact1>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact1>
<contact2>
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>
Here, a table of contacts is created that holds the records of contacts (contact1 and
contact2), which in turn consists of three entities − name, company and phone.
In distributed system transaction initiated at one site can access or update data at
other sites. Let us discuss various basic concepts used during transaction processing
in distributed systems -
The site at which the transaction is initiated is called coordinating site. The
participating sites are those sites at which the sub-transactions are executing. For
example - If site S1 initiates the transaction T1 then it is called coordinating site.
Now assume that transaction T1 (initiated at S1) can access site S2 and S3. Then
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 93
sites S2 and S3 are called participating sites.
To access the data on site S2, the transaction T1 needs another transaction T12 on
site S2 similarly to access the data on site S3, the transaction T2 needs some
transaction say T13 on site S3. Then transactions T12 and T13 are called sub-
transactions. The above described scenario can be represented by following Fig.
5.1.6.
• Transaction manager :
• Transaction coordinator:
Let TC denotes the transaction coordinator and TM denotes the transaction manager,
then the system architecture can be represented as,
Failure Modes
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 94
There are four types of failure modes,
1. Failure of site
2. Loss of messages
4. Network partition
• If two nodes are not directly connected, messages from one to another must be
routed through sequence of communication links. If the communication link fails,
the messages are rerouted by alternative links.
• A system is partitioned if it has been split into two subsystems. This is called
partitions. Lack of connection between the subsystems also cause failure in
distributed system.
Commit Protocols
Two Phase Commit Protocol
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 95
atomicity property? This property means either the transaction will execute
completely or it won't execute at all.
• The commit protocol ensures the atomicity across the sites in following ways -
i) A transaction which executes at multiple sites must either be committed at all the
sites, or aborted at all the sites.
ii) Not acceptable to have a transaction committed at one site and aborted at another.
This protocol works in two phases - i) Voting phase and ii) Decision phase.
Step 1: Coordinator site Ci asks all participants to prepare to commit transaction Ti.
• Ci adds the records <prepareT> to the log and writes the log to stable storage.
• It then sends prepare T messages to all participating sites at which T will get
executed.
• If not, add a record <no T> to the log and send abort T message to coordinating
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 96
site Ci.
• Coordinator adds a decision record, <commit T> or <abort T>, to the log and
forces record onto stable storage. Once the record stable storage it is irrevocable
(even if failures occur)
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 97
• Coordinator sends a message to each participant informing it of the decision
(commit or abort)
Failure of site
• If any of the participating sites gets failed then when participating site Si recovers,
it examines the log entry made by it to take the decision about executing transaction.
• If the log contains <commit T> record: participating site executes redo (T)
• If the log contains <abort T> record: participating site executes undo (T)
• If the log contains <ready T> record: participating site must consult Coordinating
site to take decision about execution of transaction T.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 98
• If the log of participating site contains no record then that means Si gets failed
before responding to Prepare T message from coordinating site. In this case it must
abort T
• If coordinator fails while the commit protocol for T is executing then participating
sites must take decision about execution of transaction T:
i) If an active participating site contains a <commit T> record in its log, then T site
must be committed.
ii) If an active participating site contains an <abort T> record in its log, then T must
be aborted.
iii) If some active participating site does not contain a <ready T> record in its log,
then the failed coordinator Ci cannot have decided to commit T. Can therefore abort
T.
iv) If none of the above cases holds, then all participating active sites must have a
<ready T> record in their logs, but no additional control records (such as <abort T>
of <commit T>). In this case active sites must wait for coordinator site Ci to recover,
to find decision.
It is a stage at which active participating sites may have to wait for failed
coordinator site to recover.
• The three phase locking is an extension of two phase locking protocol in which
eliminates the blocking problem.
• Various assumptions that are made for three phase commit protocol are -
• No network partitioning.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page 99
• Phase 1: This phase is similar to phase 1 of two phase protocol. That means
Coordinator site Ci asks all participants to prepare to commit transaction Ti. The
coordinator then makes the decision about commit or abort based on the response
from all the participating sites.
• Under three phase protocol, the knowledge of pre-commit decision can be used to
commit despite coordinator site failure. That means if the coordinating site in case
gets failed then one of the participating site becomes the coordinating site and der
consults other participating sites to know the Pre-commit message which they
possess. Thus using this pre-commit t message the decision about commit/abort is
taken by this new coordinating site.
• This protocol avoids blocking problem as long as less than k sites fail.
• During the translation some important actions such as normalization, analysis for
semantic errors, simplification are carried out then input query is restructured into
algebraic query.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
100
(2) Localization:
(3) Global Query Optimization: optimization means selecting a strategy from list
of candidate queries which is closest to optimal. For optimization, the cost is
computed. The total cost is combination of CPU cost, I/O cost and communication
costs.
(4) Local Query Optimization: This step is common to all sites in distributed
database. The techniques of local query optimization are similar to those used in
centralized systems.
Part-C ( 15 Marks)
S.No Questions BTL CO
1 Explain various features of distributed database and centralized database
K2 CO5
system.
Definition of distributed databases:
• A distributed database system consists of loosely coupled sites (computer) that share
no physical components and each site is associated a database system.
• The software that maintains and manages the working of distributed databases is
called distributed database management system.
• The database system that runs on each site is independent of each other.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
101
The transactions can access data at one or more sites.
(1) There is fast data processing as several sites participate in request processing. 2.
(3) The system require deadlock handling during the transaction processing otherwise
the entire system may be in inconsistent state.
(1) Often distributed databases are used by organizations that have numerous offices in
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
102
different geographical locations. Typically an individual branch is interacting
primarily with the data that pertain to its own operations, with a much less frequent
need for general company data. In such a situation, distributed systems are useful.
(2) Using distributed system, one can give permissions to single sections of the overall
database, for better internal and external protection.
• The homogeneous databases are kind of database systems in which all sites have
identical software running on them.
• In this system, all the sites are aware of the other sites present in the system and they
all cooperate in processing user's request.
• Each site present in the system, surrenders part of its autonomy in terms of right to
change schemas or software.
The heterogeneous databases are kind of database systems in which different sites
have different schema or software.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
103
• The participating sites are not aware of other sites present in the system.
Architecture
• Following is an architecture of distributed databases. In this architecture the local
database is maintained by each site.
When user makes a request for particular data at site Si then it is first searched at the
local database. If the data is not present in the local database then the request for that
data is passed to all the other sites via communication network. Each site then searches
for that data at its local database. When data is found at particular site say Sj then it is
transmitted to site Si via communication network.
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
104
Data Storage
There are two approaches of storing relation r in distributed database -
(1) Replication: System maintains multiple copies of data, stored in different sites, for
grind faster retrieval and fault tolerance.
Data Replication
• Concept: Data replication means storing a copy or replica of a relation fragments in
two or more sites.
• There are two methods of data replication replication. (1) Full replication (2) Partial
replication
• Full replication: In this approach the entire relation is stored at all the sites. In this
approach full redundant databases are those in which every site contains a copy of
entire database.
Advantages:
(3) Faster accessing: The relation r is locally available at each site, hence data
accessing becomes faster.
Disadvantages:
(1) Increased cost of update: The major disadvantage of data replication is increased
betcost of updated. That means each replica of relation r must be updated from all the
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
105
sites if user makes a request for some updates in relation.
Data Fragmentation
• Concept: Data fragmentation is a division of relation r into fragments r1,r2, r3,...,rn
which contain sufficient information to reconstruct relation r.
• There are two approaches of data fragmentation - (1) Horizontal fragmentation and
(2) Vertical fragmentation.
Horizontal Fragmentation 1:
We will get
Horizontal Fragmentation 2:
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
106
SELECT * FROM Student WHERE Marks >50 AND City="Mumbai'
We will get
Vertical Fragmentation 1 :
Vertical Fragmentation 2:
Mahendra Institute of Technology- Dept of AI&ML- AI2214402 & Database Management Systems Page
107