SYLLABU; f
]
TOPIC
“| database clusters.
_| design, architectural issues, object management, distributed
Introduction : Distributed Data Processing, Distributed
Database System, Promises of DDBSs, Problem areas.
Distributed DBMS Architecture: Architectural Models for
Distributed OBMS, DDMBS Architecture. Distributed
Database Design: Alternative Design Strategies, Distribution
Design issues, Fragmentation, Allocation.
an
Query Processing and Decomposition : Query processing
objectives, characterization of query processors, layers of
query processing, query decomposition, localization of
distributed data. Distributed’ query Optimization: Query
optimization, centralized query optimization, distributed query
optimization algorithms r
Transaction Management °: Definition, properties of
transaction, types of transactions, distributed concurrency
control: Serializability, concurrency control mechanisms &
algorithms, time - stamped &- optimistic concurrency control
Algorithms, deadlock Management
Distributed DBMS Reliability : Reliability concepts and
measures, fault tolerance in distributed systems, failures in
Distributed -DBMS, local & distributed reliability protocols, site
failures and network partitioning. Parallel Database Systems:
Parallel database system architectures, parallel . data
placement, parallel query processing, load balancing,
Distributed Object Database Management Systems =
Fundamental object concepts ‘and models, object distributed
‘object storage; object query Processing. Object Oriented
Data Model: Inheritance, object. identity, persistent
programmingINTRODUCTION —
features of relat
nal data model. (2022-23)
The Relational Model is a database model that
represents data in the form of tables or relations. Each
table consists of rows and columns, where each column
represents an attribute of the entity, and each row
represents a record.
Features of the Relational Model:
(1) | Data’ is representéd in rows and columns called
fi relations, .
(2) Data is stored in tables having relationships between
q them called the Relational model. :
i (8) Each column has a distinct name and they are
representing attributes.
(4) Data is normalized to eliminate redundancy.
(5) Data is eaccésséd. using, SQL. (Structured. Query
Language). .; « G
(6) | Data is represented in a tabular form. é
(7)... The : Relational: Model is widely, used, in modern
databases because of its, simplicity, flexibility, and
-ease of use. It is also easy to, maintain and modify,
[Link] a popular. choice for many-applications.
o ‘at are the different strategies for designing
‘distributed. database? , (2022-23)
° Write and explain design strategies.
"The strategies can be broadly divided into replication
‘and fragmentation é at
Data Replication : Data replication is: the process of
storing separate copies: of the. database at two or more
sites. It’ is. a :popular. fault tolerance technique of
distributed databases.
Advantages of Data Replication:
(1) | Reliability : In case of failure of any site, the
database system continues to work since a copy is
available at another site(s).Distrunurey OATAE
Ase SYBTEMS A
@
ta in different sites
e in ense of failure
i rondor th
\ bite,
hour
ticker Response
s So guik
ely quick r
\ Discuss the distributed database nystem? Also
explain the promises of DDBS. (2022-23)
ns require less
nt sitos and
hus, they,
‘A Distributed Database System (DDBMS) is a
database’ management aystem that manages a database
that is spread across multiple computers or sites. In a
DDBMS, each site has its own database, and the databases
fare connected to each other to form a single, integrated
systom The main advantage of a DDBMS is that it can
provide higher availability and reliability than a
Intabase system. Here. are some of the
bocome simp]
Disadvantages of Data Replicat é
GQ) Increased Storage Requirements : Maintaining
Inulniple copies of data is associated with inereased
The storage space required is in
rage required for a centralized,
ent management of distributed, fragmented,
\ and replicated data.
(2) Improved reliability/availability through distributed
(2) Increased Cost and Complexity of Data
Updating : Each time a data item is updated, the,
update needs to be reflected in all’ the copies of the US angactions!
Gata at the different sites. This requires complex (8) Improved performance.
synchronization techniques and protocols. (4) Easier and more economical system expansion.
(3) Undesirable Application ~ Database Coupling Disteibuted databases are used in many applications,
If complex update mechanisms are not used, including gorporate management information systems, &
commerce, and online banking... They offer several
advantages over centralized databases, including improved
| scalability, . availability, performance, flexibility, fault
‘tolerance, and security.
removing data inconsistency requires complex .co-
ordination at application level. ‘This results in
undesirable application — database coupling.
Fragmentation : Fragmentation is the task of dividing a
table into a set of smaller tables. The subsets of the table
are called fragments.
Advantages of Fragmentation :
(@) Since data is stored close to the site of usage,
efficiency of the database system is increased.
(2) Local query optimization techniques are sufficient for
most queries since data is locally available.
(8) Since irrelevant data ie not available at the sites,
security and privacy of the database system can be’ (1) | Distributed Concurrency Control = Diseributed,
Disadvantages of Fragmentation : Concurrency Control speeifies that synchronization of
Q) Whei sistent fresaia feces to tho’ distributed database such that
en data from different fragments are required, the integrity of the database is maintained. To maintain
access speeds may be very low, y
@ Discuss various technical problems that need
to be resolved, to realize. the full potential of
\ DDBMS. (2032-23)
‘© What are thé problems areas of Distributed
Database?
Following are ‘the Problems Areas of Distributed
database :2
@
“
6)
Management?
| users are request
database if the resources
then database grant
‘fnot available the user haét
‘are released by other Ase
Deadlock
Distribute
ise several
da
ailable at
‘sources to that User uy
it until the resources
by some ot]
I. Distributed Deadlock is manage
Known as Dea
rr ne different algorithm and techniques uel
proidance and detection algorithm, |
Replication Control : Replication is a, téshniaig
seers applies to distributed systems. A Aatab
{hal id to be replicated if the entire database oF
portion of it (a table, some tables, 0
Pergments, etc) is copied and the copies aré stored 4
different sites. ‘The issue with having more than on
copy of a database is maintaining the mutua
copkisteney of the copies-ensuring that all copies havd
jdentical schema and data content.
ao
Operating Environment : To
Distributed Database Environment a
Operating System is requirement ;
Organizational needs. Operating system plays ang
important role for managing the distribute i
database. Sometime Operating system. is ing
supported for Distributed database. :
Transparent Management ‘Transparent
management of Data is one of the major probl
areas in Distributed database. In Distril
database data is situated in multiple locations
number of users are used that database. To maint
the integrity of database transparent managemelit J
data is important. a
Security and Privacy : How to apply the secutit®
policies to the interdependent system is a great isis
in distributed system, Since distributed systems.
DistRisuteD DATABASE SYSTEMS
“Give @ brief account of architect
PMT we
‘The basic types of distributed DBMS are as follows =
(1). Client Server
TAs)
with sensitive data, and information so the system
and privacy
must have a strong security
jmeasurement. Protection of distributed system
‘assets, including base resources, . storage,
communications and user-interface VO as wel a5
of these resources,
display windows and more
distributed
higher level composites
processes, files, messages,
‘complex objects, are important issues in
‘aystem.
Resource Management : In distributed systems,
‘objects consisting of resources are located on different
places, Routing is an issue at the network layer of the
Kistributed system and at the application layer:
Resource management in a distributed system will
interact with its heterogeneous Nature.
tural models
(2022-23)
for distributed DBMS.
of Distributed DBMS.
Explain Architectures
‘Architecture of Distributed
system.
(a) A clignt server architecture has a number of
clients: and. a few servers connected in a
network.
‘Adient sends a query to one of the servers. The
earliest available server solves it and replie:
‘A Client-server architecture is simple to
implement and execute due to centralized server
system.
(b)
(©)
‘Communication
(Figure : Client - Server Architecture)Ae) TGA ror MCA DistRInUTED DATANASE ByaTeMS {a7
6. Why distributed databases are essential?
(2021-22)
% What are distributed databases?
is di
In. distributed systems,
different ‘database systems of an organization, ‘These
| ‘database systems are connected via communication links.
Such links help the end-users to access the data ea:
Examples of the Distributed database are Apache
Cassandra, HBase, Ignite, etc
Cs
SERVER
(Figure : Collaborating Server Architecture)
(a) Collaborating server architecture is designed to
run a single query on multiple servers.
(b) Servers break single query into multiple
small queries and the result is sent to the
dlient.
(©) Collaborating server architecture’ has a collecti
of database servers. Each server is capable for
executing the current transactions across the
databases.
idleware Architecture :
‘ddleware architectures are designed in such a
that single query is executed on n
servers.
(b) This system needs only one server which is
capable of managing queries and’ transactions
from multiple servers.
(©) Middleware architecture uses local servers to
handle local queries and transactions.
(A) The softwares are used for execution of queries
and transactions across one or more independent
7 cabane sever, this type of software is called :
mene |
‘Figure { Distributed Database System) s
"The above diagram-is a typical example of distrivuted
\ database system, in which communication channel is used
“to communicate: with, the/ different locations and every
system has its own memory and database.
@)
7. Explain briefly about Fragmentation i eatie
examples.
Fragmentation is a process of dividing the whole or
full database into various sub tables or sut
‘that data can be stored in different systems. ae call
pieces of sub relations e9 are called fragments,
‘These fragments are called log al dat tits and are
at various sites. t be made stu
Ceca are auch e used to reconstruct
the original relati iny loss of data),
ere isn!the use of UI 1!
‘auments, This process is called datas
eae fragments are independes
ments © d
logically
should not concern
and this is calle
fragmentation.
operation o%
fragment
which meai
The users needn't be
jon which means they
is fragmented
1e or we can Say
Advantages !
@) As the data is stored close to the usage site,
‘GHiiciency of the database system will increase
zation methods are sufficient for)
come queries as the data is available locally
(8) In order to maintain the security and:privacy of #
database system, fragmentation is advantageous
Disadvantages :
(@) Access speeds may bs
fragments are needed
(2) Ifwe are using recursive fragmentation,
be very expensive
We have three methods for data fragmenting of a table :
(Q) Horizontal fragmentation
(2) Vertical fragmentation
(@)_ Mixed or Hybrid fragmentation
Lat’s discuss them one by one.
Horizontal Fragmentation : Horizontal fragmentatio
refers to the process of dividing a table horizontally
assigning each row or (a group of rows) of relation to one of
more fragments. These fragments are then be assigned to
different sides in the distriouted system. Some of the rows
or tuples of the table are placed in one system and the rest
are placed in other systems, The rows that belong to the
horizontal fragments are specified by a condition on onie oF
more attributes of the relation. In rolational a |
Sopizontal fragmentation on table 7, can be wpa
oT)
(2) Local query optit
1e very high if data from differe!
then it wi
For this purpose,
ee
where, ¢ is relational algebra operator for selection
pis the condition satisfied by « horizontal fragment ©
For example, consider an EMPLOYEE table (7):
Eno [ Ename | Design | Salary | Dep
Tor {A _| abe _| 3000
702 [ 8 | abe | 4000.
103-| GC | sabe | 5500.
704 | 0 [abe | 5000
705 | [abe] 2000
This EMPLOYEE table can be divided into different
fragments like :
EMP 1= Dep = 1 EMPLOYEE
EMP 2= Dep =2 EMPLOYEE
‘These two fragments are: TI fragment of Dep = 1
Eno | Ename | Design | Salary | Dep
For, As | abe | sooo 7 |,
702] Babe | 4000 [+
sagment on the basis of Dep = 2 will be =
Similarly, the 72 fr
Eno | Ename | Design | Salary | Dep
Toa f|_¢ | abe | 5500 | 2
704] D~ | abe | 5000 [7 2
105 ‘abe | 2000 [2
Now, here itis possible to get back T as
T=TIUT2U...UTN
Vertical Fragmentation : Vertical fragmentation refers
to the. process of decomposing a table vertically by
attributes are columns. In this fragmentation, some of the
attributes are stored in one system and the rest are stored
in other systems. This is because each site may not need all
‘columns of a table. In order to take care of restoration,
each fragment must contain the primary key field(s) in a
table. The fragmentation should be in such a manner that
we can rebuild a table from the fragment by taking the
natural JOIN operation and to make it possible we need to
jnclude a special attribute called Tuple-id to the schema.
‘a user can use any super key. And by
‘this; the’ tuples or rows can be linked together. The
projection is as follows :
‘xal,a2,.,.an(T)
where, 1 is relational algebra operator
“al... an are the aatriubutes of 7
Tis the table (relation)po Qe
sage SysteMs
pre SA a ates AL
a fragmentation is:
Boaign
abe.
Complexity : DBAs may have to do extra work to
ensure that the distributed nature of the system is
transparent. Extra work must also be done to
maintain multiple disparate systems, instead of one
big one. Extra database design work must also be
‘This is T2 and to get back to the original T, we joig done to account for the disconnected nature of the
0 fragments TI and 72 as *gsipzoves (71% 72) 7% databgegfors'example, joing “become peobihitively
expensive when performed across multiple systems.
(2) Economics : Increased complexity and a more
extensive infrastructure means extra labour costs ¢)
Security: Remote database fragments must be
secured, and they are not centralized so the remote
sites must be secured as well. The infrastructure
must also be secured (for example, by encry ig the
work links between remote sites.
Difficult to Maintain Integrity : But in a
distributed’ database, enforcing integrity over a
network may require too much of the network's
resources to be feasible
.on For defining this type’ of fragmentation we
ELECT and the PROJECT operations of
igebra. In some situations, the horizontal
the vertical fragmentation isn’t enough to distribute one
we need
‘Mixed fragmentation can be done in two differ
G) The first method is to first create a set or grot
horizontal fragments and then .
feughonG’iam one or more of the’ GE Inexperien tributed databases are difficult to
fragments, work with, and in such a young field there is cot
® e othod is much readily available experience in “prover”
practice
Lack of Standards : There are no tools or
‘methodologies yet to help users convert a centralized
DBMS into a distributed DBMS.
What is Distributed data processing (DDP)?
Distributed data proce:
multiple computers, across different locations shar:
ity. In DDP, specific jobs ar
below is the
performed by
removed from thothe relation in
< for storing
Write some approaches for sori
distributed data storage
a
(2)
Gh Namseeneous DDB : Those database systems
rarer on the same operating system and use
n process and carry the same
@)
@
©)
(8)
(Figure : Homogeneous Distributed System)
Heterogeneous DDB : Those database systems
which execute on different operating systems unde
different application procedures, and carries different
hardware devices. i
13.
@
igure : Heterogeneous Distributed System)
Write the advantages of distributed database.
NE eee BT ed
~ data, Thu
tas
Some general features of distributed databases are :
Location Independency : Data is physically stored
DDBMS.
Distributed Query Processing : Distributed
databases answer queries in a distributed
environment that manages data at multiple sites.
HighTevel queries are transformed into a query
execution plan for simpler management.
Distributed Transaction Management : Provides
a consistent distributed database through commit
Protocols, distributed concurrency control techniques,
and distributed recovery methods in case of many
transactions and failures.
Seamless-Integration : Databases in a collection
usually represent a single logical database, and they
are interconnected,
Network Linking : All databases in a collection are
inked by a network and communicate with each
other.
Transaction Processing :-Distributed databases
incorporate transaction processing, which is a
program including a collection of one or more
database operations. Transaction processing is an
atomic process that is ither entirely executed or not
at all.
‘The, Advantages of Distributed Database is as
Hardware, Operating System,
Network and | wation Independence. It provides
Continuous operation.
©
©)
Explain the function of DDBMS.
Pee
(2) Receive an applications request.
(2) Validate analyze & decompose the request.
(8) [Link] request logical to physical data components.
(4) Search for locate read & validate the data.
(3) Ensure data base consistency security.
What are the components of DDBMS?
LES
Computer Work Stations : (sites) network system.
DDBMS must be independent of the computer system,
hardware, fee
Network Hard ware & Software Components : This is
present in each work station ..The network components
allows all sites to internet & exchange data.
Because the components like computers, as network
hard ware & so on to be supplied by different vendors 80
that's why DDBMS functions can be run on multiple’
platforms,
Communication Media : This carry the data from one
workstation to another.
The DDBMS must be communications “media
independent. 4
18. Write
1as5)
at able t0 support several types of communication
media.
‘Transaction processor (TP): It is a software
component found in each computer that requests data.
‘TP receive & process the applications data requests.
‘TP also, known as application processor (AP) or the
‘Transaction manager(TM).
Data Processor (Dp) : It is a software component found
in cach computer that stores & retrieves data located at
the size,
DP also known as data manager (DM).
What is Horizontal Fragmentation?
Horizontal fragmentation refers to the process of
dividing a table horizontally by assigning each row or
(@ group of rows) of relation to one or more fragments.
‘These fragments are then ve assigned to different sides in
© the distributed system,
What is Vertical Fragmentation?
Vertical fragmentation refers to the process of
decomposing a table vertically by attributes are columns.
In this fragmentation, some of the attributes are stored
in one system and the rest are stored in other systems.
‘This is because each’ site may not need all columns
of atable.
the parameters in. which DDBMS
architectures is depend.
DDBMS architecturesare generally depending on
three parameters :
(2) Distribution : It states the physical distrib
data across the different sites.
(2) Autonomy : It indicates the distribution of control of
“the database system and the degree to which each
constituent DBMS can operate independently
(3) Heterogeneity, : It refers to the uniformity or
dissimilarity of the data models, system components
and ‘latabases.120 DATAMASE SYSTEMS, tan
St
| @)_ Multi-database Internal + Depicts the data
ribution across different sites and multi-database
to local data mapping,
(4) Local Database View Ler
local data.
(8) Local Database Conceptual Level : Depicta local
data organization at each site.
(6) Local Database Internal Level : Depicts physical
data organization at each site.
‘There are two design alternatives for multi-DBMS
Q) Model with multi-database conceptual level.
(2) Model without multi-database conceptual level.
Mulb-daabace wie tis _Uinmnateen
View
Depicts public view of
ly has four levels of schemas
‘oneeptual Schema : Depicts the glo
emai JF Srerat
soema2 senor
[Mt catabace Conceptual Senema
‘Gibal Concoplual Shera
(Figure)
the Multi - DBMS Architectures’ fo
This is an integrated database system formed. by
collection of two or more autonomous database systems.
Multi-DBMS can be expressed through six levels
schemas : i
Multi-database View Level : Depicts multiple us
views comprising of subsets of the integr
distributed database.
Multi-database Conceptual Level
integrated multi-database
gical multi-database structure
schema nt} [yew wa
3 Depictsigllm (Figure : Model without Multi-Database Conceptual Level)
comprises of glo -
definitions oe |. What is Hybrid Fragmentati
IT 7: heen |nents with
“at the original table is often an expel
Hybrid fragmentation
ways
QQ). At first, gener
. vertical fragments from one or more oj
fragments,
set of vertical fragments; |
fragments from one or mo
ments,
DDBMS
(2) Distributed Nature of Organizational Units}
Most organizations in the current times al
subdivided into multiple units that are physicall
distributed over the globe. Each unit requires its o) -
set of local data. Thus, the overall database of;
organization becomes distributed.
Need for Sharing of Data :
organizational units often need to communicé wid
each other and share their data and reaguced
manner.
(3) Support for Both OLTP and OLAP :
Tra
Processing (OLAP) work upon diversified. system
which may have common data. Distributed datal
systems aid both these processing by. providif
ynchronized data.
Database Recovery : One of the common techni
used in DDBMS jis replication of data ai
different sites, Replication of data automaticall
helps in data’ recovery
damaged.
a
if database in any site
ers can access data from other!
(6)
Explain the
Distainureo DATABASE SyareWs wan
——oee ree
while the damaged nite in being reconstructed. Thus,
database failure may become silmost inconspicuous to
users,
Support for Multiple Application Software =
Most ‘organizations use a varicty of application
software each with its specific database support.
DDBMS provides a uniform functionality for using
the same data among different platforms.
issues
in designing Distributed
Systems.
qa)
@)
(4)
5)
ELM EY YD eee RTT
Heterogeneity : Heterogeneity is applied to the
network, computer hardware, operating system and
implementation of different developers. A key
component of the heterogeneous distributed
system client-server environment is middleware.
Middleware is a ‘set. of services that enables
application and end-user to interacts with each
other across a heterogeneous distributed system.
Openness : The openness of the distributed system
ig determined primarily by the degree to which new
resource-sharing services can be made available to
the users. Open systems are characterized by the
fact that their key interfaces are published. It is
based on a uniform communication mechanism. and
published interface for access to shared resources. It
can be constructed from heterogeneous hardware
and software.
Scalability : Scalability of the system should
remain efficient even with a significant increase in
the number of users and resources connected.
Security : Security of information system has three
components Confidentially, integrity and
availability. Encryption protects shared resources,
keeps sensitive information secrets when.
transmitted.
Failure Handling : When some faults occur in
hardware and the software program, it may produce
incorrect results or they may stop before they have
completed the intended computation so correctiveQUERY PROCESSING AND
DECOMPOSITION
Explain query processing. Als explain the
layers of query processing. (108Lay
What is meant by query processing?
Query Processing is the activity performed in
extracting data from the database. In query processing, if
takes various steps for fetching the data from the database.
a
ig. The user should be unaware of #
joes are located and the transferring
machine to a remote one should be
(3) Evaluation
‘There are two types of quory processors =
Single-Phase Commit Query Proce:
Accesses and joins information from mul
and performs updates to a single data s
it Query Processor (CACQPRRS) =
ao
jata sources,
‘There are four phases in a typical query processing.
g and Translation : After scanming the SQL
query, the query is parsed for identifying the
syntactical errors and the data types are corrected.
Once this step is passed, the 4Differentiate between homogenous. +)
heterogeneous DDBMS. (2022:
Feature _| Homogeneous DDBMS
DEMS [Same DBMS software is|Different DBMS” s
lused at each node can be used at each
[Schema identical schema] Different 7
structure is used at each}structures can be user
Inode leach node,
[Data Changes made to the|
are ‘automatically|DBMS software may
propagated to other|different data models
nodes, ensuring datal
consistency
[Complexity [Homogeneous databases|Heterogeneous databa
jare relatively easier tolare “more complex
Jmanage as the same|manage as different DBI
DBMS software is|software , is. employe
jemployed throughout the|throughout the system
system f
IConsistency}database on one node|guaranteed as - differg)
“ywvariety. of query ' -optimizing transformations
Inodes must [Link]é samé|nodes can use
DBMS software /and|DBMS - software
Jschema structure):
Fiexbiiiy [Homogeneous databases|Heterogeneous databases
oter less flexbilty as’alllotier more flexbilty as|
Briefly explain the objectives
Discuss the objectives of distributed quel
distributed query processing in detail.
(2021-22)
What are the objectives of query processing?
What is query processing in a relatio
database? Explain in detail with an examp)
How does it differ from distri
stapes fr istributed query
lectins OMT Baa te
The: main objectives of i
n query _processin
distributed environment is to form a high level wer 01
processing. (2022-23))
Processing. Explain the various phases if
DISTRIBUTED DATABASE SYSTEMS: 1.
Mistributed database, which is seen as a single
database by the users, into an. efficient execution
strategy’ expressed in a low level language in local
databases,
Query Processing and its Phases :
Query Processing is the activity. performed in
extracting data from the database. In query processing, it
takes various steps for fetching the data from the database.
‘The steps involved are: x
(Q). Parsing and translation
(2) Optimization
(8) Evaluation :
Parsing and Translation : As query processing includes
ceitain activities for data retrieval. Initially, the given user
queries ‘get translated in high-level database languages
such as SQL. It gets translated into expressions that can
be further used at,the physical level of the file system.
After this, the actual. evaluation of the queries and
ani
takes place. Thus before processing a query, a computer
eystem noods to translate the query into a human-readable
‘and understandable language. Consequently, SQL or
Structured Query Language is the best suitable choice for
humans, But, it is not perfectly suitable for the internal
representation of the query, to, the. system. Relational
algebra is well suited for the internal representation of a
query, The translation process’|in query processing is
similar to the parser of a query. When a user executes any
query, for generating the internal form of the query, the
parser in the system checks the ‘syntax of the query,
verifies the name of the relation in the database, the tuple,
and finally the required attribute'value.-The parser creates
fa tee! of ‘the query, known as. ‘parse-tree.’ Further,
translate it into the form of relational algebra. With this, it
‘evenly replaces all the use of the views when used in the
query.
Thus, we can understand the working of a query
processing in the below-deseribed diagramDistminuten DATABASE SYSTEMS: 10.5)
for constructing the evaluation plan, the user does
Reiations need not to write their query efficiently.
expression. (2), Usually, a database system generates an efficient
query evaluation plan, which minimizes its cost. This
type of task performed by the database system and is
known as Query Optimization.
(8) For optimizing a query, the query optimizer should
have an estimated cost analysis of each operation. It
is because the overall operation cost depends on the
memory allocations to several operations, execution
costs, and so on.
Finally, after selecting an evaluation plan, the
system evaluates the query and produces the output of the
query. :
‘execution plan
Evaluation
engine
Discuss the’ query optimization? Explain
(Figure : Steps in Query Processing) "distributed cost model with an example.
Suppose [Link] executes a query. As we have learned J (2022-23)
that there are various mothods of extracting the data from What is query optimisdtion? List distributed
the database. In SQL, a user wants to fetch the records of Query ‘optimization algorithms and explain
the employees whose salary is greater than or equal to any osu ridin that
10000. For doing this, the following query is undertaken : -
select emp_name from Employee where CTT
aalary> 10000; J Query optimization is the process of selecting the
Ein fe male Se trate undonstind Wie wane Gury rast efficient quey-evaluation plan from among the many
We can bring this giiery int tis Tolational algebra forme ae" strategies usually possible for a given query. It does not
Pte ewe ieee onal algebra form a: expect users to write their queries so that they can be
(D)Gastary>10000 satay (Employee) processed efficiently, Rather, it expects the system to
(2) Fastary @satary>10000 (Employee)) construct a query evaluation plan that minimizes the cost
After translating the given query, we can execute’ of query evaluation. a i.
each relational algebra operation by using different Distributed Query Optimization Algorithm :
algorithms. So, in this way, a query processing begins its @) - Semijoin Algorithm
eee (2) INGRES Algorithm
working,
Evaluation : For this, with addition to the relationalls (3) ° System R Algorithm
algebra translation, it is required to annotate thé (4) System R* Algorithm
translated relational algebra expression with thei (5) Hill Climbing Algorithm
instructions used for specifying and evaluating eacl (6) SDD-1 Algorithm
operation. Thus, after translating the user query, t Hill Climbing Algorithm :
system executes a query evaluation plan. ‘ (1) It refinements of an initial feasible solution are
Optimization : recursively computed until no more cust
() The cost of the query evaluation can vary for differe improvements can be made,
types Of queries. Although the system is responsible
0
‘eerscnmtceeIng tho quory. can boo
using tho following formul
‘Total Cont’ Data ‘ranafor Cost + Processing,
b+ Coors
wsfor cost ia the cost of transferring data
teny USO) wo sites, In this case, the data transfor
‘y global execution schedule thi coat is caleulntod us follow
Gucos all intorsite communication Data Transfor Cost = Size of EMPLOYEE table *
(ii) Determine the candidate result sites, jf Coat por byte + Sizo of DEPARTMENT table * Cont
whore a relation referenced in the quer per byto
Sl a (8) ‘Tho processing cost is the cost of processing the query
Gii) Compute the cost of transferring all the at aaah lia play ye case, the processing cost is
other eons relations to ‘each | Processing Cast = Cost por CPU cycle * Number
ww 80s candigate site with mininmam cost of CPU cycles required, to process the query at each
ix) ESO= candidate site w ’ ce
iH) Split 90 into ewo stratepinn + RSE fellowes hy (4) The coordination cost is the cost of coordinating the
we results of the query across all sites. In this case, the
(@_ ESI: send one of the relations involved in ‘coordination cost is calculated as follows:
the join to the other relation’s site — ' Coordination Cost = Size of Result Set * Cost per
Gi) ES2: send the join result to the ‘final result byte
site By estimating the data transfer cost, processing cost,
{c) Replace ESO with the split schedule which gives and, coordination cost, we can estimate the total cost of
(@) Recursively apply steps 2 and 3 on ES1 and ES2 executing the query on the distributed database system.
until no more benefit can be gained j
(e) Check for redundant transmissions in the final What are homogenous and heterogeneous database.
E Give the architecture of heterogeneous database
PEE (rains along with some query processing issues. (2021-22)
In a Distributed Database System (DDBMS), a
distributed cost model is used to estimate the cost of
executing a query on a distributed database. The cost
model takes into account the cost of transferring data
between sites, the cost of processing the query at each site, z
and the cost of coordinating the results of the query across [Patines Dass Ervonnen
all sites.
Here is an example of a distributed cost model in a
DDBMS: ‘
Consider # distributed database system with three
sites: Site 1, Site 2, and Site 8. Suppose we want to execute
4 query that involves joining two tables, EMPLOYEE and
DEPARTMENT, which ate stored at Site 1 and Site 2,
respectively. The query requires transferring data betw: :
the two sites and processing the query at cachalign er.
Distributed databases can be broadly classified into
homogeneous and heterogeneous distributed database
} environments, each with further sub-divisions, as shown in
the following illustration,
emopenaove], [Retereganarza]
fisrhaeeonon] [Ferd] [mesesDisteiouteD DATABASE SYSTEMS (89)
CTT
A Query processing in a distributed database
management aye en the transmission of data
between the computers in a
strategy for a query in the ordering of data transmissions
and local data processing in a database syatem. Generally,
her sites and cooperal a query in Distributed DBMS requires data from multiple
user requests. sites, and this need for data from different sites is
rough a single interfa called the transmiasion’of data that causes communication
aaa costs,
= PL ee 4 Query processing in DBMS is different from query
Es errogencous distributed database, differs processing in centralized DBMS duc to this communication
sent operating systems, DBMS products ai cost of data transfer over the network. The transmission
m yals Tee propartiee ar f cost is low when sites are connected through high-speed
milar schemas and software/@MMlMl Networks and is quite significant in other networks
ites use dist
composed of a variety of DBM
6, Explain the architeclure of Distributed Query
Processing.
H In a ‘distributed database system, processing a
schemas query comprises of optimization at both the global and the
G4) Transaction props local evel. ‘The query enters’ the database system at
the cliont or controlling site. Here, the user is validated,
the query. ie. checked, translated, and optimized at a
global level
The architecture can be represented as :
software. 4
A site may not be aware of other sites and so thore i
limited co-operation in processing user r
at Global Level
Distributed Execution Manager
(Figure : Heterogeneous Distributed System) t ea esanin
Heterogeneous distributed database system i
network of two or more data bases with diffe 5,
DBMS software, which can be stored on one or mOq
machines. In this system data can be accessible to seve
ses in the network with the help of genefi
tivity (ODBC and JDBC),
Local Query
Optimization
‘Local Query {Local Query
Optimization
(Figur)12,_What ts Query optimization In centralized Systema?
coas path ix determine
oe | alternative accoss paths are dorived for the
a algebra exprossion, ‘This chapter focus
o ‘ optimization in contralized system,
8 : pio iat? Proconting for a contralized aystem is done to
achiove !
(1) ‘The response time of a query is minimized.
mization is the process of selecting (2). The system throughput is maximized
jan for evaluating the query. Aft (3) Tho memory and storage used for processing is
parsed query is passed to quel reduced.
different execution plans Parallelism is increased,
\d select the plan with’ lea
after the
inal
on query
8
Query oP
efficient executio
parsing of the que~7
optimizer, which gener
evaluate parsed query an
13. What is Query Parsing and Translation?
ii F After scanning the SQL query, the query is parsed for
8. _What is meant by query Decomposition? identifying the syntactical errurs and the data types are
‘The query decomposition is the first phase of quer corrected.’ Onco ‘this step is passed, the query is
.ssing whose aims are to transform, a high-level, query’ decomposed into”'several smaller blocks of query.
into a relational algebra query and to check whether that WME Each of the block is translated into sclation algebra
query is syntactically and semantically correct. Thus, a expression,
query decomposition phase starts with a high-level query fale - —— -
nd transforms into a query graph of low-level operations/qqll\14. What is Query Optimization Issues in DDBMS?
(algebraic expressions), which satisfies the query.
; In DDBMS, query. optimization is a crucial task.
10. What is Data Shipping? i ‘The complexity is high since number of alternative
strategies may. increase exponentially due to the following
In data shipping, the data fragments are transfe1 factors:
to the database server, where the operations are executed} (1) The presence of a number of fragments,
This is used in operations where the operands. a 2) Distribution of the fragments or tables across various
distributed at different sites. This is also appropriate in} sites.”
systems where the communication costs are low, and lo (3). The speed of communication links.
processors are much slower than the client server. *(4). Disparity in local processing capabilities.
Hence, in a distributed system, the target is often to
find a good execution strategy for query processing rather
than the best one. The time to execute a query is the sum.
» of the following
| (1)' ‘Time to communicate queries ta databases.
(2) Time to execute local query fragments.
(3) Time to assemble data from different sites.
’ results to thy application.
Data localization takes as input’ the decompo:
query on global relations and applies data distributigh
information to the query in order to localize its data.
Data localization determines which fragment
involved in the query and thereby transforms
distributed query into a fragment query.inale query can be parallelize
Ewecution of @
Exec yeas
tio ways. What are
Bxecution of a single query
wars
Q) Intra
operation Parallelism
1 refers to the execution of a single que
jpatalic] on multiple processors and disks.
(2) Interoperation Parallelism : In Inter quel
‘em, different queries or transactions execilf
with one another. This form of parall
transaction throughput.
can incre
DistRinuteD DATAMASE Systeus
Data Shipping : In data shipping. the data fragments are
transferred to the database server, whore the operations
are executed. This is used in operations where the
operands are distributed at different sites, Thia is also
appropriate in systems where the communication ensta are
low, and local processors are much slower than the client
server.
Hybrid Shipping : This is a combination of data and
operation shipping. Here, data fragments are transferred
to the high-speed processors, where the operation runs.
‘The results are then sent to the client site.
Write the sic ‘ note about Distributed quer;
optimization. ae
How Distributed Query is optimized?
6. +
commands |
=|
Distributed query optimization requires evaluation
a large number of query trec~ each of which produce
required results of a query. This is primarily due to thg
presence of large amount of replicated and fragmented
data. Hence, the target is to find an optimal solutioy
instead of the best solution.
The main issues for distributed query optimization
are:
@) Optimal utilization of resources in the distributes
©) Query trading
(8) Reduction of solution space of the query.
Optimal Utilization of Resources in the Distributel
System Be
A distributed system’ has a number of databi
servers in the various sites to perform the operati
pertaining to a query. Following are the approaches)
yurce utilization
able at
Project operations.
Relators
‘sa Siping
‘Operation Shipaing Hyer Shpaing
(Figure)
Query Trading : In query trading algorithm for
distributed database systems, the controlling/client site for
a distributed query is called the buyer and the sites where
the local queries execute are called sellers. The buyer
formulates a number of alternatives for choosing sellers
and for reconstructing the global results. The target of the
buyer is to achieve the optimal cost.
The algorithm starts with the buyer assigning sub-
queries to the seller sites. The optimal plan is created from
local optimized query plans proposed by the sellers
combined with the communication cost for reconstructing
tthe final result. Once the global optimal plan is formulated.
the query'is executed,
Reduction of Solution Space of the Query : Optimal
ee SO
that the cost of query and data transfor is reduced. This
can be achieved through a set of just as
heuristics in centralized systems.the next, how
operations shoul
Step’ 3 iCode Ge
execution, an internal node is exe
operand tables are available. The no
.. This process conti
jot node is executed’
replaced .
For instance, consider the followi
EMPLOYEE :
(Cempib | EName | Salai
DEPARTMENT:
DNo | Location
Example 1: The query considered is as follows :
10 C£Name~Arunkunar EMPLOYEE)
‘The query tree appears as follows :
ing example
Tempio
|
Sename = “Arn Kura
©—[anore]
(Figure)
Step 2 Query Plan Generation : After the query tre
uence’ of relational
algebra expression.
esents the result of a
les should be passedaewernenenasnanatd
tion displays a subset ¢
cal partition of the tab
\ds of a table. Th
Syntax in Rela
i inbutel
ple, let us consider tl
onal Algebra:
Acie Name)>))8S
as consider the following Studen
If we want to display the names and courses of ally
nal algebra
we will use the following. rel:
students,
expression :
‘SSipi_{Name,Course}{(STUDENT}SS
Selection : Selection operation displays a subset of tuples
of a table that satisfies certain conditions. This gives a
horizontal partition of the table.
Syntax in Relational Algebra:
‘SSisigma_{«(Condtions}>)(«{Fable Namie)>)}$S
For example, in the Student table, if we want toe
display the details of all students who have opted for MCA,
course, we will use the following relational algebr
expression:
$Sisigma_{Course} = {\smallBCA')(STUDENT)}S$_—_
Combination of Projection and _ Selection
Operations : For most queries, we need a combination’ o
projection and selection operations. There are two ways ‘
write these expressions :
(1) Using sequence
operations.
@ Using rename operation to generate intermediat
results.
For example, to display names of all femal
students of the BCA course : i
() Relational algebra expression using sequence 6
of projection and selection
isremuTeD DATABASE SySTEMS ta
$SipL_{Namo}{\sigma_(Gender = \small "Female" AND \
Course = \émall "BCA"}((STUDENT)))38.
(4) Relational algebra expression using rename operation
to generate intermediate results
‘$8FemaleBCAStudent Weltarrow \sigma_{Gender_ =
\small_ “Female” “AND \ Course = \small "BCA'}
{(STUDENT)}SS.
‘$SResul Veltarrow \pl_{Name}(Femalescastudenty)$S
Union : If Pis a result of an operation and @ is a result of
another operation, the union of P and @ (Sp \cup QS) is the
set of all tuples that is either in P or in Q or in both
without duplicates.
For example, to display all students who are either in
Semester 1 or are in BCA course :
$$SemtStudent —“Veftarrow \sigma {Semester =
4{(STUDENT)}SS
S$SBCAStudent Veftarrow \sigma_(Course = \smat
"BCA'}(STUDENT)}$5
SSResuit Veftarrow Sem1Student cup BCAStudentsS
Intersection : If P is a result of an operation and Q is a
result of another operation, the intersection of P and Q
(Sp \cap QS) is the set of all tuples that are in P and Q
both.
For example, given the following two schemas :
| EMPLOYEE
PROJECT.
Pid [City Department Status.
To display the names of all cities where a project
located and also an employee resides —
‘$8CityEmp Veftarrow \pi_{City((EMPLOYEE))$S
b $$CityProject \leftarrow \pi_{City}{(PROJECT)}$$ sl
$$Resull VeRarrow CityEmp \cap CityProjectsS
_ Minus : If Pis a result of an operation and @ is a result of
another operation, P- Q is the set of all tuples that are in P
and not in Q.
“For example, to list all the departments which do not
| have an ongoing project (projects with status = ongoing) :
‘S$AllDept \leftarrow \pi_{Department}{(EMPLOYEE)}SS
S$ProjectDept \leftarrow \pi {Department} (\sigma_{Status =
| \sriall“ongoing’)(PROJECT))$S
$$Resultleftarrow AllDept - ProjectDeptss
| Join : Join operation combines related tuples of two
" different tables (results of queries) into a single table.two schemas, Customedy
For example, consider
ab
CUSTOMER
Custio_[ASeNOT 1
BRANCH.
IFSCeode What is Transparencies in distributed database?
BranchiO | BranchName [IFS x :
‘Tolist the employee details along with branch actaall (2022-23)
s$Result Veftarrow
nchib)(BRANCH}$S) na
‘Transparency in DDBMS refers to the transparent
distribution of information to the user from the system. It
helps in hiding the information that is to be implemented
by the user. For example, in a normal DBMS, data
independence is a form’ of transparency that helps in
hiding changes in the definition & organization of the data
from the user, But, théy all have the same overall target.
‘That means’to make’ use of the distributed database the
same as a centralized database.
‘Types of Distributed Database Management System :
Distributed Database Management System, there are four
types of transparencies, which are as follows
(1) ‘Transaction Transparency : This transparency
makes sure “that all. the transactions that are
fe distributed preserve. distributed database integrity
i and regularity. Also, it is to understand that
distribution transaction access-is the data stored at
multiple locations. Another thing to notice is that the
DDBMS is responsible for maintaining the atomicity
of every sub-transaction (By this, we mean that
either the whole transaction takes place directly or
doesn't happen in the least). It is very complex due to
the use of fragmentation, allocation, and replication
structure of DBMS.
Performance Transparency : This transparency
requires a DDBMS to work in a way that if it is a
\ Me. centralized database management system. Also, the
\ system should not undergo any downs in performance
as its architecture is distributed. Likewise, a DDBMS
|
\wowtie_{Customer-BranchiD=[Link]
must have a distributed query processor which can
map.a data request into an ordered sequence of
\ y ‘operations on the local database. ‘This has another
| complexity to take under consideration which is the
agmentation, replication, and allocation structure of
| DBMS.fo
@ DBI ye the Transaction. (2022-23)
$ Define Transaction?
nd data models) A transaction can be defined f
1s) ‘ansaction can be defined as a group of tasks. A
DBMS may be diffe; single task is the minimum provessing unit which cannot
the most compli be divided further. .
nake use of as 2 generalization, Let's take an example of a simple transaction.
See Distribald Suppose a bank employee transfers % 500 from A's account
nsparency
o cee nenee to B's account, This very simple and small transaction
involves several low-level tasks.
as a single thing or a logical entity, and if a DDBI MeAwmine
Gisplays distribution data transparency, then Oren
ces not neod to Know that the data Olt Balen = balance
nnted. Distribution transparency has. its New Balance = Old Balance - 500
types. which are discussed below : [Link] = New Balance
(a) Access transparency Close_Account(A) .
(b) Location transparency Bis Account :
(c) Concurrency transparency open Aeoant e
falance = B balance
° picnics New Balance = Old Balance + 500
[Link] = New_Balance
Close_Account(B)
2. Differentiate between Shared lock & Exclusive lock.
(2022-23) | |. @ Explain ACID properties with an example.
: = (2022-23)
Eee NT ee RTT PST TT Define ACID properties with suitable examples.
Shared Lock Exclusive Lock
Used Tor read-only|Used for read and wiite a
loperations loperations (1) A-Atomicity
IMutipie transactions can|Oniy one Wansaclon Gan @) C-Consisteney
hold a shared lock on thelhoid an exclusive lock| (3), Flsolation
same data item Jon a data item F (4) D-Durability :
locks alow[Exclusive locks allow] (1) Atomicity : The term stomicity defines that the data
Consistency Jmuttiple transactions [Link] transaction to} remains atomic. It means if. any oferation is
8d @ resource, but nonelaccess a resource for performed on the data, either it should be performed
5 modhty reading or writing or executed completely or should not be executed at
[Shared Tocks are nol all. It further means that the operation should not
with any break in between or execute partially. In the case of
executing operations on the tr
operation should be completely exec
have “bee Tocks have’ al partiall
OTE
time thanllonger wait time. than| Example : Here, the set of operations are
shared locks.
exclusive locks(3) Isolation
w@
amount in Bob
fo add
account, revel
Alice’
ice’s account |
cry attribute in the database:
Tec to ensure the stability of the databai
“ynetraint puts on the data value should
the execution of
fe doing an operation, revert’ back the syste
revious state
Example : The total amount in Alice’s and Bol
account should be the same before and after ‘th
transaction. The sum of the money in Alice and Bob
account before and after the transaction is $20 Ee)
his transaction preserves consistency AC
roperties in DBMS.
: If you. are. performing | multip! (4)
transactions on the single database, operation from
any transaction should not interfere with operation inj
other transactions. the execution of all transactions
should be isolated from other transactions.)
Example : If there is any other. transaction |
(between Mac and Alice) going, it should-not’make
any effect on the transaction between Alice and Bob. J
Both the transactions should be isolated.
Durability : All the above three properties should be
@
.
Example : It may happen. A system gets crashed
after completion of all the operations. If the system
restarts it should preserve the stable state. An
amount in Alico and Bob's account should be the
same before and after the system gets a restart.
Draw transaction state diagram. (2022-23)
Explain Transaction States with diagram.
A transaction may go through a subset of five states,
nt before and after active, partially committed, committed, failed and”
ude aborted.
he system fails because of the invalid -d (1). Active : ‘The initial state where the transaction
enters is the active state. The transaction remains
in this state while it is executing read, write or other
operations.
Partially Committed : The transaction enters this
state affer the last statement of the transaction has
been executed.
Committed : The transaction enters this state after,
Sucéessful completion of the transaction and system
“checks have issued commit signal.
Failed : The transaction goes from partially
committed state or active state to failed state when
it is discovered that normal execution can no longer
proceed o system checks fail.
‘Aborted : This is the state after the transaction has
been rolled back after failure and the database has
been. restored to its state that was before the
transaction began.
‘The following state transition diagram depicts the
states in the transaction and the low level transaction
operations that causes change in states.
satisfied while the transaction in progress. Bug
durability issues can happen even after. the
completion of the transaction. So this is the. ACI
Property After Completion of Transaction, ©. \\/,
‘The changes made during the transaction shot
exist after completion of the transaction.
Sometimes it may happen as all the operation
the transaction completed but the system fal
‘begin_ransaction
immediately. In that case, changes’, made while
transactions should persist. ‘The system ‘shot
return to its previous stable stat
(Figure)fc.7)
nn Manager? a 7
rnetions of Transactio a
What are the fi WRITE (Ay
READA(B)
ct
READ
ViRTEZ (6)
EADIE)
CZ |
Non Serial Schudule ¢ Whon w trannaction i cvorlapped
won the transaction 1") and 2,
amply : Com « following oxtmplo
pi Se
READIN,
White 1(r)
TAHT
WHRITE(0) i
READA(b) Te
‘Iypon of petal bility + There aro two types of
lizability if
71 in the re
rondy tho
7. © Dineusn the nertulizability,
© Explain view nertalizubllity, (2021-22)
© What in Dintributed werializubllity?
writo operation
Sorlalizability :unrreney Cong
ie oy! by eome other trananction.
cated by a cycle in the wait-for-
! 8 directed ich od
ea ted graph in which the lena
remes algorithm ist transactions and the edyen denote ep oa nie oe
“stamp Ordering Protocol. tranee tt, example, in tho. following wait for graph,
Oy TT toe, Waiting for data item X which is locked
Y 13, 73 is waiting for ¥ which is locked by 12 and 72 ta
+ ionsien crontab waiting for 2 which ig TI. Hence, a waiting cycl
esa lion ta |. Hence, a waiting cycle
at eliotead transactions can proceed
ia
nsaction,
h they are submitted to th
mp of a transaction 7" al
vou may refer here. 1
‘idea for this protocol is to order the
based on their Timestamps. A schedule +
nsactions participate is then serializabl
| schedule permitted has thé
n the order of their Timestamp Values!
< simply, the schedule is equivalent to thej
cular Serial Order corresponding to the ordér of the! handling, namely :
saction timestamps. Algorithm must ensure that, for 3 Ge Dodiiock prevention,
each items accessed by Conflicting’ Operations in ‘the (2). Dea keiidena
schedule, the order in which the item is accessed does not (3) Dee desta ea
iolate the ordoring pears this, une to Timestamp All of the three approaches can be incorporated ia
Berea are both a centralized and a distributed database system.
3 (Figure)
ELE RSTS Te a RE eT ee ETP
There are three classical approaches for deadlock
~ GQ) WTS) is the largest timestamp of any) Deadlock Prevention : The deadlock preventios
transaction that executed write(X) successfully. approach, doge not allow any suite locks
a 02 allow any transaction to acquire locks
(2) B_TS(X) is the largest timestamp of any transactio that will lead to deadlocks. The convention is that when
that executed read(X) successfully. more than one transactions request for locking the same
- - al data item, only one of them is granted the lock.
: ing. Di various) *
Explain, deadtock Dene on 2022-28) | One of the nose pose deadlock prevention methods
lock avoida - (2022-25) is pre-aequisition of all the locks. In this method. a
Lompare Distelouted, ‘Deadlock, pene in transaction acquires all the locks before starting to execute
istributed Deadlock Avoidance. Explain on and retains the locks for the entire duration of transaction.
If another transaction needs any of the already acquired
locks, it has to, wait until all the locks it needs are
available. Using ‘this approach, the system is prevented
from being deadlocked since none of the ‘waiting
transactions are holding any lock.
Deadlock Avoidance : The deadloc
handles. deadlocks beforr taey oce
scheme of Distributed deadlock Detection an
Recovery. (2021-22)4
What is deadlock? Write the three classical;
approaches for deadlock handling.
voidance approach
~ It analyzes the
Deadlock is a state of a database system havin,
or more transactions, whenthe locl
reaction ca
is “Toeked by some othe
de, the lock manager ru
fest whether keeping the transaction)
tise a deadlock or not. Accordingly,
et action can wait or 07
ck. However
n ineampatible m0
ios whel
Jhould be aborted.
\corithms for this purpose, namel
a+ Let us assume that there are t\
ne. Tl and T2, where 71 tries to lock a data item
; already locked by 72. The algorithms are ag}
follows
(1) Wait-Die : If TI is older than 72, Tl. [Link]
wait, Otherwise, if 71 is younger than 12, T1 is}
aborted and later restarted.
Wound-Wait : If 71 is older than 72,’ T2,is aborted
and later restarted. Otherwise, if T1 is younger than,
T2, T1 is allowed to wait.
Deadlock Detection and Removal : The deadlock
detection and removal approach runs a deadlock detection
algorithm periodically and removes deadlock in case there,
is one. It does not check for deadlock when a transaction{
places a request for a lock, When a transaction requests a
lock, the lock manager checks whether it is available. If it
is available, the transaction is allowed to lock the dat
item; otherwise the transaction is allowed to wait.
‘Since there are no precautions while granting lock
requests, some of the transactions may be deadlocked. ‘7
detect deadlocks, the lock manager periodically cheel
the wait-forgraph has eycles. If the system is deadlocke
the lock manager chooses a victim transaction from
cycle. The vietim is aborted and rolled back; and
restarted later. Some of the methods used for’ victiy
selection are : . 3
(2) Choose the youngest transaction.
(2) Choose the transaction with fewest data items.
@
DrstRInUtED DATANASE BrsteNs,
tos,
(8) Choose the tranan
number of updates,
(4) Choose the trans;
(8) Choose the tran:
more cycles,
His approach is primarily suited for systema having
Hransattions low and where fast response to lack requests
Deadlock Handling in Distribute :
also distributed, ie. the same transaction may be
processing at more than one site. The two main deadlock
handling concerns in a distributed database on the log recoril at its site. |
(2) a the -coordinator(C;)sends a Prepare ‘ A
miodsage to all the sites where the transaction
executed,yet
response.
so it must.
commit,
sady. T>. and local Transaction manay
T to C,. Once the read
ng can prevent a
of nsaction 7 excef
portion
Coordinator
(Figure : Messaging in Phase- 1st)
Phase- 2nd : The Second phase started as the respo!
abort 7 or commit T receives by the coordinator (C;) fro}
a
all the sites that are collaboratively executing tht
transaction 7. However, it is possible that some site fa
to respond: y be down, or it has been disconnected.
the network. In that case, after‘a suitable timeout perig
Il be after that time it will treat the site as if
had cent abort T. The fate of the transaction depél
upon the following points :
() If the coordinator receives ready 7 from all’ i
Participating sites of 7, then it decides to commit}
Then, the coordinator writes on its site log re
ommit T> and sends a message commit 7’
sites involved in 7.
=_ en
Ee eg arate
e tea
If 0 site roeoiy
em a commit T mes
Tin tie toda Wesango, it commits tho
Fenent of Tat that alte, and write it in 16
records .
(A) Howover, if the coordinator has received abort T
from one or more sites, it logs at its site
"and. then sends, abort 1’ messages to all sites
involved in transaction 7.
Disadvantages
()/ The major’ disadvantage of the Two-phase commit
Protocol is faced when the Coordinator site failure
may result in blocking, so a decision either to
‘commit or abort. ‘Transaction(7) may have to be
“postponed until coordinator recovers. “
(2) Blocking Problem : Consider a scenario, if a
‘Transaction(7) holds locks on data-items of active
sites, but amid the executior;, if the coordinator fails
and ithe active sites keep no additional log-record
except like or . So, it
‘becomes impossible to determine what decision has
been mado(whether to /). So, In
that case, the final decision is delayed until the
Coordinator is restored or fixed. In some cases, this
‘may take a day or long hours to restore and during
this time. period, the locked data items remain
inaccessible for other transactions(Ti). This problem
is known as Blocking Problem.
aborts Tfrom ain incorreet oni
backward recovery. It
able Schedule +A sel
wwerable a8 wo eat
rite operation, on aystom to a previous, ch
instanco when it has entored an
3 effort is made to place tho aystom in
state from which it can_con to operat
error
techniques is that potential errors
anticipated in advance, Only then is it feasible to
change those mistakes and transfer to a new state.
no read or
on of transaction: @
salled cascadeless sched
schedule.
Why is recovery in a distributed DBMS more
complicated than in a centralized system?
(2021-22)
12. Define Moss Concurrency protocol?
This is a protocol which is used to control tH
concurrency in the Distributed database envionment. It
mainly used for handling the nested (hierarchical In order to’ recuperate from database failure,
transactions which are based on inheritance. 5 database management systems resort to a number of
Consider a Transaction (T) acquire a lock on recovery management techniques. The typical strategies
data-item(S) in some mode (M). The Transaction(T) hold for database recovery are :
the lock in mode(M) until it terminates. When anyone s (Q) In case of soft failures that result in inconsistency of
transaction(T1) of T commits, then its parent Transactig database, recovery strategy includes transaction undo
‘occupies or inherits that lock and retains until all & or rollback. However, sometimes, transaction redo
transaction can't finish. If a transaction holds a lock oni may also be adopted to recover to a consistent state of
data-item(X) so, it has the right to access the locld the transaction,
data-item(X) in the corresponding mode. However, it is (2) Incase of hard failures resulting in extensive damage
valid in case if a transaction retained a lock from any otf | to database, recovery strategies encompass restoring
some subtransaction(descendant).A retained lock is onl} a past copy of the database from archival backup. A
just a kind of placeholder and indicates that J more current state of the database is obtained
subtransactions that are out of corresponding hiers through redoing operations of committed transactions
can't acquire the lock, but descendant can acquire thé 1g from transaction log.
As soon as a transaction becomes a retainer off Recovery from Power Failure : Power failure causes
loss of information in the non-persistent. memory. When
power is restored, the operating system and the database
management system restart. Recovery manager initiates
Tecovery from the transaction logs.
In case of immediate update mode, the recovery
‘manager takes the following actions
ward Recovery + Moving the system
nto a formerly accurate ep)Se apeemeusetnamiitemtamtaa
fez
ions which are-on the
int are undone or redone.
left. side of the last
committed and: needn't
taken for checkpointing are
Fuzzy Cheekpointing : In fuzzy checkpoi
time’ of checkpoint, all the active transactions are written
te in the log. In case of power failure, the recovery manager
‘e active list and. fat processes only those transactions that were active duising
jist, checkpoint and later. ‘The transactions that have boon
committed before checkpoint are written to the disk and
hence need not be redone.
Transaction Recovery Using UNDO / REDO :
‘Transaction recovery is done to éliminate the adverse
effects of faulty transactions rather than to recover from a
failure. Faulty transactions include all transactions that
have changed the database into undesired state and the
tae = at al tos, To ecsver froin th transactions that have used values written by the faulty
crash causes a to se loss. To transactions. .
i . the operatin| :
hand cath, pes a : & ene a Ps a tise jon recovery in these cases is a two-step
. restored, an 88 2
ising the database backup ona franiseop es «) @) ‘UNDO all faulty transactions and transactions that
recovery method is same for both immediate anc 4 may be affected by the faulty transactions.
‘update modes Soe es (2) REDO all transactions that are not faulty but have
recovery manager takes wing actions jone du ransactions.
(2) The transactions in the commit list ete neing ae fa to faulty trated
before-commit list are redone and written onto thegillli15, uss the’. issues to achieve atomicity
commit list in the transaction log. distributed transaction management system.
(2) The transactions in the active (2021-22)
Tight side of the last
‘The transactions to the
ynsistent. checkpoint aro already
Processed again. The actions
ing, at the
¢ ‘action is taken for transactions in commit or ab
transaction log. Distributed transaction refers to the transaction in
Checkpointing : Checkpoint is a point of time at which @ which multiple servers are-involved. Multiple servers are
record is written onto the database from the buffers. Ai called by a client in Simple Distributed Transaction
consequence, in case of a system cra whereas a server calls another server in Nested
ger does not have. to redo the transactions ‘Transaction. The execution of a transaction at many
"© been committed before checkpoint. Periodid must either be committed at all sites or aborted at all sites.
checkpointing shortens the recovery process. But this should not be the case that transaction is
‘pes of checkpointing techniques are committed at one site and aborted at another site.
(1) Consistent checkpointing - Distributed site systems use distributed commitm
(2) Fuzzy checkpointing i rule to ensure atomicity across sites. Atomic commitment
(3) Consistent Checkpointing is a channel of need for coop: n across a variety of
systems,
ig creates a consistent
it. During recovery, only
ostworkora, and
core, mak sic
anh 08.8 decision to ensure trans
‘ole of Worl,
ora;
Coordinator's existonoy oka
their outcome
coordinator’s decision,
‘Atomic Commit : ‘The atomic
m
manage
tions
1 transaction as needed. :
MUI data changes ave treated as if thé
were a single operation. That is, either all of;
“ations are made, or none of them are madi
omicity feature assures that if a debit
fully made from one account, the: matching
credit is made to the other account in an applicati
ansfrstoney from one acrount to andthe
(2) Consistency : This property implies that wl en, all partici v¢ 2
transaction begins and ends, the state of’ data| Distributed ‘OnePhace Cos
consistent. For example, it ensures that the valu commitment. protocol invoh een
remains consistent at the start and end of @ 2S
conclusion
(2). If any participant docides
Participants must have voto.
to commit, then all other
d yes.
from one account to another. 7
Isolation : In this property, concurrently unning
transactions appear to be serialized. For example, the,
isolation property assures that the transferred funds |
between two accounts can be seen:'by another
transaction in either one of the accounts, but not,
both, or neither.
Durability : Changes to data persist when aj
transaction completes successfully and are not] foo
undone, even if the system fails. It assures thai ; igure)
modifications made to each account will not. bé Patributed Two-Phase Commit: There are two phases
reversed in an application that transfers money fr Phase teVoliser fee oe
one account to another. ~ = e2fe
Coordination in Distributed Transactions : Att Oe ate wacuaaies ore ee
time of coordination in Distributed Transactions, one of t (2) Tho coordinator must wait until a response whether
servers becomes a coordinator, and the rest of the worl Salt pr ROC Teddi ia voonived. thom cara werden ora,
become coordinators. “timeout occurs.
() Ina simple transaction, Workers must wait until the coordinator sends the
Coordinator. “prepare” message.
@)
@
participant
(3)
the first server acts as tl
(2) In the nested transaction, the top-level server acts’ (4) If a transaction is ready to commit then a “ready”
the Coordinator. ao message is sent to the coordinator. a
(3) Role of Coordinator : The coordinator keeps tr: (6) If a transaction is not ready to commit then a “no
message is, sent to the ceordinator and resulting in
of participating
se fatalattid aborting of the transact
servers, gathers results. fr