0 ratings0% found this document useful (0 votes) 97 views34 pagesChapter 3 Distributed Database Design
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Chapter 3
Distributed Database
Design
Table of Contents
Alternative Design Strategies
Distribution Design Issues
.
.
® Fragmentation
.
Allocation1. Alternative Design Strategies
# Two major strategies
¥ Top-down approaches
¥ Bottom-up approaches
1.1 Top-Down Design Process
Requrersent asia
(Sie Renews
co
i Yen Se
rs
(teotniconcepia Schema] [Acme
|“ saroaen ons J
(Geen Concaer seroma )
Physal Deson
(rrr Sener)
seusDetails of Design Process
Requirement Analysi
v Defines the environment of the system.
¥ Elicits both the data and processing needs of all potential
DB users
© System Requirements
¥ Whete the final system is expected to stand?
¥ Performance, Reliability, Availability, Economics,
Flexibility
Details of Design Process (Cont'd)
® Conceptual Design
¥ Determines entity types and relationships among these
entities
¥ Entity analysis:
determines the entities, attributes, and relationships
¥ Functional analysis:
— determines the fandamental functions with which the
modeled enterprise is involved
¥ The process is identical to the centralized database designDetails of Design Process (Cont'd)
iew Design
¥ Defines the interfaces for end users
¥ The conceptual schema can be interpreted as being an
integration of user views.
© Distribution Design
¥ Designs the local conceptual schema by distributing the
entities over the sites of the distributed system
¥ Consists of two steps :
fragmentation and allocation
1.2 Bottom-Up Design Process
© Top-Down Approach:
Suitable when a system is being designed from scratch
© Bottom-Up Approach
Suitable when many DBs exist, and the design task
involves integrating them into-one DB
~> The bottom-up design process consists of integrating
local schemas into the global. conceptual schema,
> Schema Translation & Schema Integrating
> In the context of Heterogeneous Database !2. Distribution Design Issues
© Why fragment at all?
© How should we fragment?
© How much should we fragment?
Is there any way to test the correctness of
decomposition?
© How should we allocate?
© What is necessary information for fragmentation and
allocation?
2.1 Reasons for Fragmentation
Arelation is not an appropriate unit of distribution.
¥ Application views are usually subsets of relations.
¥ Unnecessarily high volume of remote data access or
unnecessary replication
¥ Not support intra-query concurrency
> decompose a relation into fragments
© Disadvantages of fragmentation
¥ applications defined on more than one fragments
performance degradation by snion or join
v¥ semantic data control
integrity checking is very difficult5
SSSUEBE OE
cops 10
2.2 Fragmentation Alternative
® Horizontal Fragmentation or Vertical Fragmentation
es
JINAME
in
R
Instumentaton
Dotabase Develo,
ok
2NO.
JINAME
BUDGET
tor.
B
in
anos
Maintenance
span
‘0000
New vor
Pans
Example of Horizontal Partitioning
Capersh,
BUDGET WO.
130000 aL
138000 2
0000 ay
310000 Mt
NAME
Instrumentation
Detabase Develop
canjcam
Maintenance
Example of Vertical Pastitioning
2.3 Degree of Fragmentation
© Notto fragment at all: relation
© Fragment to the level of individual tuples or
Fragment to the level of individual attributes.
© Suitable level of fragmentation?
¥ Such a level can only be defined with respect to the
applications that run on the database.
> According to the valne of applicatian-specific
parameters, individual fragments cem be ielentified.
Capers2.4 Correctness Rules of Fragmentation
© Completeness
¥ If arelation instance R is decomposed into fragments
R,, Ry, .. .. R,, each data item can be found in R can
also be found in one or more of R,'s.
© Reconstruction
¥ If arelation instance R is decomposed into fragments
R,, R,... «Rye it should be possible to define a
relational operator Vsuch that
R=YR, WR, € Fy
© Disjointess
¥ If arelation instance R is decomposed into fragments
R,,R),.. « R,.and data item d,is in R,, it is not any
other fragment R, (£#/),
cupers ut
2.5 Allocation Alternatives
© Comparison of Replication Alternatives
Fall Replication [Partial Replication | Partitioning
‘Query
cae Easy Difficult Difficult
Directory Easy or
Management | _ponexisient panel ee
"Consureney
Goatrol
Reliability “Very high igh Tow
Moderate Difficult Essy
Reality Possible application] Realistic _| Possible application
Capers2.6 Information Requirements
© Database Information
© Application Information
© Communication Network Information
© Computer System Information
3. Fragmentation
© Design of Horizontal Fragmentation
© Design of Vertical Fragmentation
© Design of Hybrid Fragmentation3.1 Horizontal Fragmentation
© Primary horizontal fragmentation
© Derived horizontal fragmentation
© Information requirements of horizontal
fragmentation
¥ Database Information
¥ Application Information
Database Information
© Coneemis the global conceptual schema
¥ How the DB relations are connected to one another,
especially with joins?
¥ Expression of relationships among relations using liaks
© Example
: dain nail 2
(oonea weeps 7
mes
| 1
ZR AUNE TMF] OWA, BIGOT]
6
FO TuD, WES, OU
Capers 19
10Application Information
© Predicates used in user queries
v The most active 20% of user queries account for 80%
of the total data access.
© Simple Predicate
Vy Ay O Value, 8 € {=< 4.5,>,2}
¥ Pr, : set of all simple predicates defined on relation R,
® Minterm Predicates
¥ m,: the conjunction of simple predicates
¥M, : the set of minterm predicates for relation R,
M,= {m,|m,-Ap,'}.1¢é¢m,1¢/ 30000
py TITLE ="Mechéng” 2 m,:+(TITLE ="“Elect.Eng”) « SAL s 30000
Pa: TITLE = "Programmer" 2m, : “(TITLE = “Elect.Eng”) x SAL > 30000
py: SAL # 30000 im, : TITLE = "Programmer" 4 SAL = 30000
P, : SAL > 30000 my: TITLE = “Programmer” SAL > 30000Application Information (71l 4)
© Quantitative Information
v Minter selectivity: sei(i,)
Number of tuples of relations that would be accessed
by a user query specified according to a given minterm
¥ Access frequency: acely,)
Frequency with which user application access data
Primary Horizontal Fragmentation
© Definition
A ftagmentation generated by a selection operation on
the owner relation of a database schema
¥ Given relation R,, its horizontal fragments are
R, - 0-(R,), 1 ££, F, : the selection formula (m,)
© Example: Sample Relation J
12Example: Minterm Fragments
Name
Instrumentation
Mantenance
Simple Predicate2| BI} 1%
© Completeness
¥Pr°l simple predicate °| 8 484! fragment Oli HH 2
fragment] RESIS APSE SSH t501 SS
# SS, Pris complet
¥ Example
= Pr = (LOC = "Montreal", LOC = “New York", LOC = “Paris” }
— Pr = Pr U (BUDGET s 2a9000, BUDGET > 200000 }
© Minimality
¥ PrOfl °l 2H fragment FI} Fit F.2 2BS OP,
Soe stg Sez
¥ Example
= PrT= Pr’ U { INAME'= “Instrumentation” }
at
13Algerithm COM Hn
input: RL: relation; Pr: set of simple precicates
output : 61: set af simple predicates
decane: F set of minterm fragments
begin
find @p, © Pr such that p partons R aooording to Rte
(the mineerm fragment accorcirg top.)
begin
find ap, « Pr such that p, partons some fof Pr’
Sctording to Pe
Pre prug,
Pra Prop,
FoFul
eng-begin
ntl Bis complete
end {COM_MIN}
P_Rufe : fundamental rule of completeness and minimality, which states,
that a fragment is partitioned “into at Least two parts which are accessed
differently by ai least one applisation.”
cops 28
Aigrithm PHORIZONTAL
Input: R,: relation; Pr, set of spl predicates
utput: Meet oF minterm Fragments
begin
Pe = COM _ MINER, Pr)
etermine the set M, of minterm precicatas
determine the set Lf implications among p< Pr?
far eoen m, = ca
fim is contradictory according to I then
M= Hm,
eo
end far
end {PHORIZONTAL}
Example:
Py: att = valued
1 att ~ valuel) ~ (att
att ~ valued) = (att ~ valued)
Mm: (att = valued) a (att = valuez) contradictory by E
mt (att = valuel) A —fatt = value2)
tg: afatt'= valve) (att = value2)
my {att = valved) » fat = value) contradictary by T
‘Copers- 27
14Example
© S.J: subject of primary horizontal fragmentation
® Assumption for S
wv There is only | application that accesses S,
¥ That application checks the salary information.
¥ Queries for S are issued at two sites.
# Simple Predicates
py: SAL ¢ 30000
P) : SAL > 30000
=P, = {0y, Pz) = complete aod minimal by COM_MIN
Example (Cont'd)
© Minterm Predicates
1 (SAL = 30000) A (SAL » 30000)
im, : (SAL-= 30000) 1 « (SAL ~ 30000)
{SAL £30000) A (SAL » 30000)
SAL = 30000) n «(SAL » 30000)
> m, and m, is contradictory: M = {m,, M3}
Therefore, we define two fragments,
SAL
‘0000
3000
15Example (Cont'd)
© Assumption for J
¥ There are 2 applications that access J
~ The first is issued at three sites and finds the names and
budgets of projects given their number.
¥ The second is issued at two sites and has to do with the
management of the projects.
© Simple Predicates
Pp, = LOC = "New York”
BUDGET > 200000
= Pr {D4, Pz, By, Pa, is complete and minimal : COM_MIN
Tapers
Example (Cont'd)
© Minterm Predicates
m, : LOC = “Montreal” 4 (BUDGET = 20000)
m, : LOC = "Montreal" , (BUDGET » 20000)
img : LOC = "New York" « (BUDGET < 20000)
ma, : LOC = "New York” 4 (BUDGET » 20000)
m, : LOC = "Paris" n (BUDGET = 20000)
m, : LOC = "Paris" A (BUDGET » 20000)
Therefore, we define six fragments,
F, = Quy Jor Jay Jar Joy Je} according to M.
16Derived Horizontal Fragmentation
ead
¥ Defined on a member relation of a link according toa
selection operation on its owner.
¥ Given a link L, ownenL)~ S & member(L) = R
-R,-R semi joinS, 1i£w
# of fragments that will be defined on R
},= a primary horizontal fragment for S
© Example
L, : owner(L,) = S and member(L,)=E
E, : E semi_join §,, where $, = Gea, « scocol)
E+E semi_join $,, where S; = Osa, - gaol)
Potential Complication
e FMS
When there are more than two Links into a relation, there is more
than ene possible horizontal fragmentation of the relation,
Two criteria
¥ Fragmentation used in more applicstioas
Y¥ Fragmentation with better join characteristics
Recall the advantages of the fragmentation
¥ Performing. qpery on smaller relations
1 Perfonning joins ina distibured fashion
Simple Graph
A gph with only one link coming in or going out of a fragment
¥ Effects of storage and join pecformance!
17Example : Fragmentation of G
Assumption
There are two applications which access G.
wv One finds the names of engineers who work at certain places.
sf The other accesses the project that employees work om and how
long they will work oa those projects.
© The first fragmentation according to J,, Jp. J3
¥ G, = Gsemi join Jy, Where J, ! Og —-ryitmar()
JG, = Gsemi_join J, where 3; > O.o¢ onan tore)
4G, = Gsemi join Js, where; + o.oc --peed)
© The second fragmentation according to E,. Ey
¥ G, = Gsemi_join E,
¥G, = Gsemi_joinE,
The final choice af the fragmemation scheme may be a decision
problem adivessed dia mg allocarion.
opens 34
Checking for Correctness
© Completeness
¥ PHF: A set of complete and mivimal predicates, Pr’
¥ DHF: Ensures referential integrity
© Reconstruction
¥ for a relation R with fragments F;
R= UR, WR € Fa
© Disjointness
PHF: Mintenm predicates determining the
fingmentation are nautually exclusive
¥ DHF: Disjointuess can be guaranteed if the join graph
is simple; otherwise investigate actual tuple values
Capers
183.2 Vertical Fragmentation
© Definition
Partitions R to fiagments R,, R,, .... R,, each which
contains a subset of R’s attribute as welll as the primary
key of R
@ Inherently more complicated than horizontal
partitioning
¥ Total number of alternatives
¥ Obtaining optimal solution is very difficult
¥ Resort to heuristics
Two Types of Heuristics
© Grouping
¥ Starts by assigning each attribute to one fragment
¥ Joins some of fiagments until some etitesia is satisfied.
© Splitting
¥ Starts with a relation and decides on beneficial partitioning
¥ Top-down desien methodology dll *{
@ Note
= Replication of the key in the fragments
= Therefore, splng 1 considered only for shoe crates that do
sot participate ithe rier es
19Information Requirements of
Vertical Fragmentation
© What needs to be detesmined about applications?
¥ Affinity of tributes: How closely related the atributes ace?
Y Antribote usage valve: vee{qy Ay) = 1 oF 0
© Example
» SELECT BUDGET FROM 1 WHERE JNO = Vaue;
SELECT JRE, BUDGET FROM J;
STLECT JMAME FROM } WHERE LOC = Value;
SELECT SUM(QUDGET) FROM J WHERE LOC = Va
A= 20
ainisy Matric ‘a= JANE.
Attribute Affinity
© Attribute Affinity
aflfA,.A,)= > Yreila, dace ia.)
ref(g,) : # of accesses to attributes (14, A.) for each
execution of application q, at site S,
ace,(q,): the application access frequency measure
20Example
— Assume that ref(q,) - I for all q, and S,
— Application frequencies
e0,(q,)= 15 ace(g)= 20 ace g,) = 10
aee,(ql- 5 aces DB acey(qy)- 0
neey(qyI= 25 ace 25 aees( ash
acc (ad= 0 ace(qy=
= ace, (q,) +ace, (q,)+ ace ,tq,) = 45
A
attribute affinity matrix (AA)
opr 9
Clustering Algorithm
oe
‘ Find some means of grouping the attributes of a relation based on the
oxcboteeinty valoes in AA.
Net comribution tothe global affinity measure of pacing atirte A
between A, an
~cont(Ay Ay, A} = bond, A,) + bond(, A) ~bond(A, A)
where bond(A,.A, par. Laff (eo)
® Example
cot Ay, Ay. Ay) = bond(A,, Ay) + bondtA,, Ay) —bord{A,, Ag
bondl,. A,
bondi.
Capers
21Algerithm CLUSTERING
input: 4 : attribute afinty matric
output: CA: custered afeity matric
ies remember that AK ip enn — mutes}
a
index = 35
while index = n do {choose the “best” location for attnibaite RA jae, >
begin
Tor | mn 1 to index — 1 1
calulate conta, Rye
ent for
CAICUIADE CONE Aye 2+ Anton Arior sx) (DOUNICERY GONG. F
Joc = placement given by maximum cont value
for jem next ae By 1 0
order the rows according ta the relative ovdering af columns
rd, (CLUSTERING)
boudl Ay. A;)+ bond A;, A,) —bond(Ay, An)
bandia,..a,)=0
A)—45 4545-0453 45430-4410
‘gomt(Acy Ay, Ai) = 4410
Ordering(1-3-2)
cont{A, Ay. ;) = bond(A,, A,)+ bord(A,, A.) —bond(A, A)
bond(A,, A;)—bondA;, A;)~ 4410
bondlA., As) = 890, boudlA,. Ad}
conti. Ay A.
capers
22comt{As, Ay. A.) = bondlA., As)-+ bond(A;.A,)—bond(A, A)
bond(A,, A,)= $50
bomi(Ay. Ag)— bonds. A,)~ 0
conta,
Aad so forth ...: The resulting Clustered affinity Mawix (CA)
A
a
3
15
78
capers 46
Partitioning Algorithm
© The upper left-hand comer of CA: TA
© The lower right-hand comer of CA: BA
{A |use(.A)= 1}
1g, AQig) STAY
19, AQKg,) S BA}
Q-1TQUBQ}
23> 2 ref ,(q,)ace ,(q,)
dtOes,
2 > ref (q, Jace ,(q,)
cBO ze ref (q, ace (q.)
cog = 2 2 ref ,(q, hace ,(g,)
To find the point x such that z is maximized :
z=CTQ - CBQ—COQ?
opens
Algorithm PARTTFION
input = CA: clustered affinity matrix -R: relation
output =F: set of fragments,
begn
4 determine the wali For the Fest colurnn
{the subscripts in the cast equations indicate the salt point }
galcdate CTQ, 1, BQ,~ 11 COR,
Dest = CTR,» BQ,“ (C0, a
for i from n~ 2 to 1 by— do
‘aleuate CTQ, C8, COQ,
2 Cr CB, CoQ?
sf (> best) then
‘255190 Dest to 2 and record the shift positon:
and
endfor
call SHIFT(CA)
‘until ne more SHIFT is possible
seconstruct the matriy according to the shit postion
R= Nyt) UK {K 6 the set of primary key attributes af R }
Capers a7Checking for Correctness
© Completeness
A=TAUBA
@ Reconstruction
R-JOIN,R, VR € Fy,
® Disjointness
snot important as horizontal fragmentation die to the
seplication of peimary key
3.3 Hybrid Fragmentation
© The levels of nesting in most practical applications
do not exceed 2,
Capers
25Correctness of Hybrid Fragmentation
Reconstruction
Starts at the leaves of the partitioning tree and moves,
jpward by performing joins and unions
© Completeness
¥ Fragmentation is complete ifthe intermediate and leat
fragments are complete.
@ Disjoinmess
¥ Fragmentation is disjoint ifthe intermediate and leat
fragments are disjoint,
4. Allocation
© Definition
The allocation problem involves finding the optima
distribution of relations (fragments) to sites.
© Measures of optimality
v¥ Minimal cost
—cost of storing, querying, updating, and
data communication
¥ Performance
—to minimize the response time and
—to maximize the system throughput at each site
Capers
264.1 Some example of data placement
and allocation
(Example 1) Single-relation case
Table Table?
pase [ome [paves] oice#] [page | pave 2| pave 3
apfaolanfas| [am|aolan
a] an | 2s) | ep) (La) | ts) | Gea
eofeaanlaa] fenlenlae
ts) | pd | Aw | a0 20] 2s) | ao
Query: ( (112, 2, 7% pI ae,
@ Distributed placement
— sited: page(l, 4). site 2: pages(2, 3)
= site L: page(i, 2), site 2: pages(3, 4)
= site L: paget, 3), site 2 pages(?, 4)
(Example 2) Multiple-relation case
@olenlen[ealan fen
doo} To.) [dd [toy [ibd [te
@oTanl ica Tas [ms [is
oo fee Tao [oe [ ms) [a
Oej4 =n (Ry) JOIN gy -catt Fea fR)
© Distributed placement
= site L: pageli.2), site 2: pagest3, 4)
= site L: paget!.3), site? pagest2. 4) uepers4.2 A practical combinatorial optimization
approach to the file allocation preblem
© Assumption
¥ Most files are not fragmented.
¥ Tis unlikely that we will wy to exploit parsllelisea in ows fle
allocation.
o Each computing facilities have tight limits on their local mass
stomge capacity.
¥ Starage is considered to be a constant on the optimization, rather
thaaas a cost
The transaction traffic is known in advance,
¥ Roads and updates have equal casts
¥ Remote accesses all have the same unit cos.
of No redundancy is permitted and fragmentation of file is forbidden,
capers st
Notation and Constraints
Nnodes, indexed by j. capacity
M files, indexed by i, size —
T wansactions, indexed by &, frequency rom node j = f,
ny, accesses fiom transaction é to file
x, decision variable 1 —file is allocated to node j, 0— otherwise
yx sbvilisisM
‘The goal of FAP - maximize(Z,, X, V,)
where V,, = E, fy; (1j,) © cost of local retrievals
28Algorithm FAP
|. Caleulate HC) = J" | Vy= mV), 175
2, Anoptimal set of sy is given by xy = 1 for some j € 50)
and xy~ O otherwise
Lf this solutions feasible (i.e, mests the constrains), it is our answer:
goto step7
Otherwise, identify all nodes which cause the constraints to be broken
. For every such aver- subscribed node, solve the corresponding inepsack
problem, thereby eliminating a node and the files allocated to that node
from further consideration,
‘Consider Ha) for any nodes, which remain. If these are such nodes go to
step 2
‘Otherwise, we have finished
opens 58
Example
Allocate 8 files among five sites, each with 20 MB disk.
Access rates of transactions to files (n,,)
— Fle (sel bes}
rarsactons [Tao] 7a) Fae] 4) [5] 67 Oe)
1” [0 30
7
5
‘spies 7
29The frequency of transactions in sites (f,)
Sites
z 2 2 4
w
2 |
z
Transactions
ops se
Ey 7a
a3 | 105 a
m5 173 0
rose [260 [1015 ‘Lo
70
=
130 | 382 | 495
red | a1 | 30
Ji) are the yellow elements for each i
‘If we assign x) ~ 1 for these and 0 for the ether entries we have our first solution,
Site | hns been allocated §5Mbytes of files. This és not a feasible solution,
‘Site L has been allocated too umueh.
‘The maximum value(V,) we can get from storing my files.on site | is obtained
by storing files 1.2, and 8 there.
‘Our new V; table is obtained by eliminating row ! above and column 1. 2. and 8.
‘The new I) are the underlined entries (all allocated to site 3)
apts 0
30‘Assign x, = 1 to these, x, = 0 to the remainder of the entizs,
3°, Site 3 has been allocated 47 Mstes,
4, Site 3 has been overiondes,
‘The suaxinwa value we can get flom storing flee on site 3 is ebtained
by storing files + and 5 there,
6. Ouraew V, table is obtained by eliminating zow'3 aad colurau 4 and
5 fromthe seduced table.
New V,
Fie (sizerMoytes]
ce] 6.0
vai] a fo
95 | ase [30
490 [150 [90
The new j(/) are underlined (all alloested to site 4)
2° Assiga 1 to xy for these entries. 0 forthe rest
3°. Site thas been allocated 29 Mbytes
4”. Site 4 has been overloaded.
Store file 3 at site 4
6°". Our new V, table is obtained by eliminating row 2, coluaan 4 from the
table above
Without spelling out the details, itis clear that the remaining 2 files, 6 and
T,are allocated to site
Total space used
hbytes)
19
So eur sobution is fi
15
18
ul
314.3 Database Allocation Problem
© DAP is different from FAP
¥ The relationship between fragments should be taken into
account
¥ The relationship between the sllocation and query
processing should be properly modeled,
¥ FAP do not take into consideration the cost of integrity
enforcement.
v The cost of enforcing concurrency control mechanisms
should be considered.
© There are no general heuristic models that take as
input a set of fragments and produce a
near-optimal allocation subject to the types of
constraints discussed here.
© We present a relatively general model and then
discuss a number of possible heuristics that
might be employed to solve it,
pers
32Information Requirements
© Database information
~ the selectivity ofa fragment F, with respect to query q: sel(F,)
~ the size ofa fragment F,: size(F,) = cardi) - lengitiF,
© Application information
= # of tad (write) accesses from, to F, : RR,(UR,)
— UM with uy (J 600), RM with ry (1 of 0}, and © with (7)
~ for each query. a maxiniam allowable sesponse time is defined
© Site information
— for each sit, is storage and processing capacity is defined
~ wait cost of storing daiaat site S,: USC,
~ the cost of processing one unit of workat site S,: LPC,
© Network information
~ the communication cost per fame between S, and 5, 3,
~ the size (in bytes) of one frame ; size
capers ot
Allocation Model
© Objectives
‘Mininsize( Total Cost) subject to respense-ime'steage processing
conus
1 AEE, is stored at S,. and x, = @ otherwise
© Total Cost
roc = Forc,+ STC,
STCjp: the cost of fragment Fa ste S,
STC, USC,» sizalF,) “a
QP; : query processing cost of application 4,
QPC,~ processing cost (PC;) + transmission cas TC)
Capers
33PC, =access cast (AC,)+ integrity enforcement cast (IE, }+CC cast (CC,)
ac, (ily URy + 8p BRyVe Ry LPC,
ZR Br
oper
Solution Methods
© The formulation of DAP is NP-complete.
© Thus, one has to look for heuristic methods that yield
suboptimal solutions.
© Heuristic methods
— kuapenci: problem solution
= braneland-bonad
— network flow algorithm:
= There 1s nor enough dara to determme how close the
results are to the oprtmal.
Capers sr