0% found this document useful (0 votes)
83 views3 pages

Distributed Database Management Systems

This document discusses centralized versus distributed database management systems (DBMS). It compares parallel and distributed DBMS, focusing on distributed DBMS issues like data organization, query processing, concurrency control, and recovery across multiple autonomous and heterogeneous sites. Horizontal data partitioning schemes like round-robin, hash, and range partitioning are described. Predicate-based partitioning generates fragments using a set of predicates to divide relations across sites. The choice of partitioning attributes and predicates affects performance and workload distribution.

Uploaded by

Samar Ali Nosser
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views3 pages

Distributed Database Management Systems

This document discusses centralized versus distributed database management systems (DBMS). It compares parallel and distributed DBMS, focusing on distributed DBMS issues like data organization, query processing, concurrency control, and recovery across multiple autonomous and heterogeneous sites. Horizontal data partitioning schemes like round-robin, hash, and range partitioning are described. Predicate-based partitioning generates fragments using a set of predicates to divide relations across sites. The choice of partitioning attributes and predicates affects performance and workload distribution.

Uploaded by

Samar Ali Nosser
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Centralized versus distributed DBMS

Centralized
Processor

Distributed Databases
Memory Disk … Disk

CPS 216 Distributed


Processor Processor
Advanced Database Systems

Memory Disk Memory Disk


Disk
Disk Disk
Disk
2

Parallel versus distributed DBMS Distributed DBMS issues


• Parallel DBMS • Database management with multiple sites that are
– Fast interconnect possibly autonomous and heterogeneous
– Homogeneous hardware/software – Data organization
– Total control over components – Query processing and optimization
• Distributed DBMS – Concurrency control and recovery
– Geographically distributed
• Disconnected operations possible
– Heterogeneous hardware/software
• Performance, data formats, data processing capabilities
– Autonomy of individual sites 3 4

Data organization Partitioning schemes


A1 A2 A3 A4
• Horizontal t1 …
• Top-down approach t2 … Site 1
t3 … Site 2
– Have a database t4 …
………… … … …
– How to partition and/or replicate it across sites … Site k

• Bottom-up approach A1 A2 A3 A4
– Have existing databases at different sites • Vertical t1 …

t2
– How to integrate them together and deal with t3 …
t4 …
heterogeneity and autonomy ………… … … …


• Focus for today
– Data partitioning using a top-down approach • Or hybrid Site 1 Site 2 Site k
5 6

1
Horizontal partitioning schemes Properties of a correct partitioning
• Round-robin partitioning R → { R1, R2, …, Rk }
• Hash partitioning
• Range partitioning • Completeness and reconstructability
• Predicate-based partitioning R = R1 ∪ R2 ∪ … ∪ Rk
• Derived horizontal partitioning
• Disjointness
Ri ∩ Rj = Ø for any i ≠ j

7 8

Round-robin partitioning Hash partitioning


R R0 R1 R2 R R0 R1 R2
t1 t1 t1 hash(k1) = 2 t1
t2 t2 t2 hash(k2) = 0 t2
t3 t3 t3 hash(k3) = 0 t3
t4 t4 t4 hash(k4) = 1 t4
… …

• Evenly distributes data • Evenly distributes data (assuming a good hash function)
• Good for full relation scans • Good for point queries and equijoins on the partitioning
• Not good for range queries attribute
9 • Not good for range queries 10

Range partitioning Predicate-based partitioning


R partitioning vector: <4, 7> R0 R1 R2
t1 k1 = 5 t1
• Fragmentation
t2 k2 = 8 t2
– Decide how to divide a relation horizontally into
t3 k3 = 2 t3
fragments using a set of predicates
t4 k4 = 3 t4

• Allocation
• Good for range queries on the partitioning attribute
– Decide which fragments go to which site
• The choice of partitioning vector is important
– Bad vector may result in both data skew and execution skew

11 12

2
Predicate-based fragmentation Example
• Given a relation R and a set of simple predicates • Say queries use simple predicates:
P = { p1, p2, …, pn } A < 10, A > 5, D = ’CS’, D = ’EE’
• Generate minterm predicates • Generate, simplify, and eliminate minterms
A < 10 ∧ A > 5 ∧ D = ’CS’ ∧ D = ’EE’ eliminated
– M = { m | m = ∧ (1 ≤ k ≤ n) pk* }, where pk* is either pk
or ¬pk A < 10 ∧ A ≤ 5 ∧ D = ’CS’ ∧ D ≠ ’EE’ A ≤ 5 ∧ D = ’CS’

– Simplify minterms in M and eliminate useless ones
• Final set of fragments
• For each m in M, generate a fragment σm R
σ5 < A < 10 ∧ D = ’CS’ R σ5 < A < 10 ∧ D = ’EE’ R
σA ≤ 5 ∧ D = ’CS’ R σA ≤ 5 ∧ D = ’EE’ R
σA ≥ 10 ∧ D = ’CS’ R σA ≥ 10 ∧ D = ’EE’ R
13 14

Choice of simple predicates Allocation of fragments


• Completeness • Tough optimization problem
– There is an equal probability of access by every – Do we replicate fragments?
application to any two tuples in the same minterm – Where we place each copy of each fragment?
fragment • Metrics: minimize query response time; maximize
• If p is used in fragmentation, then σpR either accesses all throughput; minimize network traffic; …
tuples in a fragment or none in a fragment
• Constraints: available storage, bandwidth, processing
• Minimality
power; response time requirement; …
– If a predicate causes a fragment f to be further
fragmented into fi and fj, there should at least one • Issues: origin of queries; selectivity of fragments; query
application that accesses fi and fj differently processing strategies; consistency enforcement; …
» Use all relevant predicates in frequent queries!
15 16

You might also like