0% found this document useful (0 votes)
14 views65 pages

ch2 Parallel DB

The document discusses the need for parallel database systems due to the increasing size of data and the demand for efficient processing of transactions and queries. It outlines various architectures, performance measures, and partitioning techniques for optimizing database operations. Additionally, it addresses challenges such as speedup limitations and handling data skew in parallel processing environments.

Uploaded by

pulokr221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views65 pages

ch2 Parallel DB

The document discusses the need for parallel database systems due to the increasing size of data and the demand for efficient processing of transactions and queries. It outlines various architectures, performance measures, and partitioning techniques for optimizing database operations. Additionally, it addresses challenges such as speedup limitations and handling data skew in parallel processing environments.

Uploaded by

pulokr221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Parallel Databases

Motivation
 Almost abandoned before, but nowadays:

 Databases are growing increasingly large


 PB data is collected and stored
 transaction data (12306), SNS
 multimedia data, increasingly stored
 Byte、KB、MB、GB、TB、PB、EB、ZB、YB、DB、NB
 Bank of Holland:20 DC, 7PB disks, 20PB
magnetic tape storage, 50~70% increase / year

 Parallel machines, common and affordable


 Prices dropped sharply: microprocessors, memory,
disks

2 Advanced Database System Principle


Motivation
 Thus, large-scale parallel database systems is needed:
 to store large volumes of data
 to process time-consuming decision-support queries
 to provide high throughput for transaction processing

3 Advanced Database System Principle


Parallel Systems
 Parallel database systems consist of
 multiple processors
 multiple disks
 connected by a fast interconnection network.

 A coarse-grain parallel machine consists of a small number of


powerful processors

 A massively parallel or fine grain parallel machine utilizes


thousands of smaller processors.

4 Advanced Database System Principle


Parallel Systems

 Two main performance measures:


 throughput --- the number of tasks that can be completed
within a given time
 response time --- the amount of time it takes to complete a
single task from the time

5 Advanced Database System Principle


Speed-Up and Scale-Up
 Speedup: a fixed-sized problem executing on a small
system, then is given to a system which is N-times larger.
 Measured by:

speedup = small system elapsed time 小系统要更多时间


large system elapsed time 大系统完成更快
 Speedup is linear if equation equals N.

6 Advanced Database System Principle


Speed-Up and Scale-Up
 Scaleup: increase the size of both the problem and the
system
 N-times larger system, perform N-times larger job
 Measured by:
scaleup = small system small problem elapsed time
big system big problem elapsed time

 Scale up is linear if equation equals 1.

7 Advanced Database System Principle


Transaction Scaleup
 Transaction scaleup:
 Numerous small queries submitted to a shared database

 Problem: N-times (users submitting) requests, N-times


larger database, use an N-times larger computer

 Well-suited to parallel execution.

8 Advanced Database System Principle


Factors Limiting Speedup and Scaleup
 Speedup and scaleup are often sublinear due to:

 Startup costs:
 important when the degree of parallelism is high
 Interference:
 Different processors, etc.

 Skew:

Overall execution time determined by the slowest one

9 Advanced Database System Principle


Parallel Database Architectures
 Several major architectures:
 Shared memory -- processors share common memory
 Shared disk -- processors share common disk
 Shared nothing -- processors share neither common memory
nor common disk
 Hierarchical -- hybrid of the above architectures

10 Advanced Database System Principle


Parallel Database Architectures

11 Advanced Database System Principle


Shared Memory
 Processors & disks access common memory
 via a bus, or through an interconnection network.
 Extremely efficient communication between processors
 data in shared memory can be accessed by any processor
 processor can send message to others thru main memory
 Bottleneck
 not scalable
 <= 32 or 64 processors
 Widely used for lower degrees of parallelism.

12 Advanced Database System Principle


Shared Disk
 Access all disks via interconnection network
 but processors have own memories
 memory bus is not a bottleneck
 +: fault-tolerance — once a processor fails, other
processors continue its tasks (database is accessible from
all processors).
 Bottleneck: occurs at interconnection to the disk

 Can scale up to a good number of processors


 With slow communication

13 Advanced Database System Principle


Shared Nothing
 Each node is complete one
 Communicate thru interconnection network
 Each node is treated as a Server
 Examples:
Teradata, Tandem, Oracle-n CUBE
 Shared-nothing can be scaled up to thousands of processors

 Main drawback:
 cost of communication
 non-local disk access
 sending data involves software interaction at both ends.

14 Advanced Database System Principle


Hierarchical
 Combines :
 shared-memory, shared-disk, shared-nothing
 Top level is a shared-nothing architecture
 Each node of the system, could be:
 a shared-memory system
 a shared-disk system
 a shared-memory system
 To reduce the complexity, use distributed virtual-memory
 Logically, has a shared memory
 Physically, has many independent main memory system
 Also called non-uniform memory architecture
(NUMA)
15 Advanced Database System Principle
Content
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism

16 Advanced Database System Principle


Parallelism in Databases
 Data partitioned across multiple disks, if parallel I/O
 Individual relational operations is executed in parallel
 E.g., sort, join, aggregation
 each processor works independently

 Queries can be run on different processors


 expressed in high level language (SQL)

 Different queries can be run in parallel


 Concurrency control takes care of conflicts.

17 Advanced Database System Principle


I/O Parallelism
 Partition a big relation to multiple disks
 To Reduce time required to retrieve relations from disk:
 Horizontal, vertical

 Horizontal partitioning, row-wise:


 tuples of a relation are divided to a number of disks

18 Advanced Database System Principle


I/O Parallelism
 Partitioning techniques (number of disks = n):
Round-robin: (roulette)
Send the ith tuple of the relation to disk #=i mod n.

Hash partitioning:
 Choose one or more attributes as the partitioning attributes.
 Choose hash function h with range 0…n - 1
 Let i denote result of hash function h
 then send tuple to disk i.

19 Advanced Database System Principle


I/O Parallelism (Cont.)
 Range partitioning:
 Choose an attribute as the partitioning attribute
 Choose a partitioning vector [vo, v1, ..., vn-2]
 Let v be value of a tuple in the partitioning attribute
 Tuples with v < v0 go to disk 0
 Tuples such that vi  vi+1 , go to disk i + 1
 Tuples with v  vn-2 go to disk n-1.

20 Advanced Database System Principle


I/O Parallelism (Cont.)
E.g., with a partitioning vector [5,11],
a tuple with partitioning attribute value of 2, go to disk 0,
a tuple with value 8, go to disk 1,
a tuple with value 20, go to disk2.

21 Advanced Database System Principle


Comparison of Partitioning Techniques

Round robin:
 Advantages
 Best suited for sequential scan
 All disks have almost an equal number of tuples
 retrieval work is thus well balanced between disks

 Point, range queries are difficult to process


 No clustering -- tuples are scattered across all disks

22 Advanced Database System Principle


Comparison of Partitioning Techniques(Cont.)

Hash partitioning:
 Good for point queries
 lookup single disk, leave others available for
answering other queries.
 Can have index, making lookup & update efficient

 Good for sequential scan


 If hash function is good, and tuples equally
distributed over disks
 Retrieval work is balanced

 No clustering, so difficult to answer range queries

23 Advanced Database System Principle


Comparison of Partitioning Techniques (Cont.)
 Range partitioning:
 Provides data clustering, by partitioning attribute
 Good for sequential access
 Good for point queries: only one disk
 For range queries, a few disks may be accessed
 Remaining disks are available for other queries
 Good if result tuples are distributed to a few blocks
 Disadvantage:
 If data skew, satisfying tuples, only distributed on
several blocks, cause low parallelism !
 However, in Round robin, Hash Partitioning, many
blocks maybe fetched , which is a high parallelism

24 Advanced Database System Principle
Partitioning techniques need to consider
We need to consider:
 Small relations:
 a relation contains only a few tuples
 assign the relation to a single disk.

 Large relations are preferably partitioned across all


available disks.

 For a relation of m disk blocks, given n disks, the


relation should be allocated min(m,n) disks.

25 Advanced Database System Principle


Handling of Skew
 The tuples distribution may be skewed
 some disks have many tuples,
 while others have fewer tuples

 Types of skew:
 Attribute-value skew
 all the tuples have the same value
 occurs in range-partitioning & hash-partitioning
 Partition skew
 for range-partitioning, a bad partition vector
 assign too many tuples to some partitions
 Less likely happened in hash-partitioning
26 Advanced Database System Principle
Handling Skew in Range-Partitioning
 To create a balanced partitioning vector:
 Sort the relation on the partitioning attribute
 Construct partition vector by scanning sorted relation:
 After read every 1/nth of the relation
 partitioning attribute value of next tuple, add to vector

 Still have imbalanced partitions


 if duplicates exist for the same value

 Alternative technique based on histograms is used

27 Advanced Database System Principle


Handling Skew Using
Virtual Processor Partitioning
 Handling skew using virtual processor partitioning:
 create a large number of partitions (say 10 to 20 times
the number of processors)
 Assign virtual processors to partitions either in:
 round-robin fashion, or
 based on estimated cost of processing each virtual partition

 Basic idea:
 the skew might spread over a number of virtual partitions
 gets distributed evenly!

28 Advanced Database System Principle


To summarize
 Partitioning Techniques
 Round-robin
 Hash partitioning
 Range partitioning
 Types of skew
 Attribute-value skew
 Partition skew
 Handling Skew
 Balanced partitioning vector
 virtual processor partitioning

29 Advanced Database System Principle


Content
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism

30 Advanced Database System Principle


Interquery Parallelism
 Interquery Parallelism:
 Queries/transactions execute in parallel
 Could increase transaction throughput;
 Could scale up a transaction processing system
 Could process a larger number of transactions per
second.

31 Advanced Database System Principle


Interquery Parallelism
 Is the easiest form of parallelism in DB
 particularly in a shared-memory parallel database

 Is complicated, for shared-disk or shared-nothing:


 Locking & logging must be coordinated by passing
messages;
 Data in a local buffer may be updated by other
processor
 Cache-coherency has to be maintained
 reads & writes buffer must use the latest version

32 Advanced Database System Principle


Cache Coherency Protocol
 Intuitive cache coherency protocol:
 Before reading/writing, the page must be locked in
shared/exclusive mode
 After lock, read the page from disk
 Before unlock, write back the modified page
 Release a lock

 Many other more complex protocols

33 Advanced Database System Principle


Content
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism

34 Advanced Database System Principle


Intraquery Parallelism
 Intraquery Parallelism:
execution of a single query in parallel on multiple processors;
 important to speed up long-running queries.

 Two complementary forms of intraquery parallelism :


 Intraoperation Parallelism
 parallelize each individual operation: sort, selection, join
 Interoperation Parallelism
 execute different operations in parallel

the first form is better when:


# of tuples >> # of operations

35 Advanced Database System Principle


Content
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism

36 Advanced Database System Principle


Parallel Sort
For intraoperation parallelism
Range-Partitioning Sort
 Choose processors P0, ..., Pm, m  n -1, then sorting
 Then create range-partition vector with m entries

 Step 1: Redistribute the relation using range partitioning


 all tuples in the ith range, sent to Pi
 Pi stores the tuples to disk Di

 Step 2: each processor Pi sorts its partition locally

37 Advanced Database System Principle


Parallel Sort
 Each processor executes same operation (sort) in parallel

 Final merge operation is trivial:


 for 1 ≤ j ≤ m, the key value in processor Pi ≤ that in Pj.

 Piperforms merge on the data stream received, to get a


single sorted run

 The sorted runs on processors P0,..., Pm-1 are


concatenated to get the final result.

38 Advanced Database System Principle


Parallel Join
 The join operation test pairs of tuples, satisfying the join:
 If yes, the pair is added to the join output.

 Parallel join algorithms attempt to split the pairs to be


tested over several processors
 Each processor computes part of the join locally

 In the last step, the results from each processor is


collected to get the final results.

39 Advanced Database System Principle


Parallel Join
 Types of parallel join
 Partitioned join
 Fragment and replicate
 Partitioned Parallel Hash-Join
 Parallel Nested-Loop Join

40 Advanced Database System Principle


Partitioned Join

41 Advanced Database System Principle


Partitioned Join (Cont.)
 For equi-joins and natural joins:

 Equi-join
SELECT * FROM employee INNER JOIN department ON
[Link] = [Link]
 Natural join:
SELECT * FROM employee NATURAL JOIN department

42 Advanced Database System Principle


Partitioned Join (Cont.)
 For equi-joins and natural joins:
 Let r and s be the input relations, to compute r s
r.A=s.B
 r , s is partitioned into n partitions: r0, r1, ..., rn-1 and s0, s1, ..., sn-1

 Can use either range partitioning or hash partitioning.


 r and s must be partitioned on the join attributes r.A and s.B,
 use the same functions: range-partitioning vector, hash function

 Partitions ri and si are sent to Pi,


 Each Pi locally computes ri si ri.A=si.B
 Any of the standard join methods can be used.

43 Advanced Database System Principle


Fragment-and-Replicate Join
 Partitioned-join do not fit for all conditions
 e.g., non-equijoin conditions, such as r.A > s.B
 All tuples in r might participate join

 In this case, parallelization is implemented by


fragment and replicate

44 Advanced Database System Principle


Depiction of Fragment-and-Replicate Joins

45 Advanced Database System Principle


Fragment-and-Replicate Join (Cont.)
 Can reduce the sizes of the relations at each P
 r is partitioned into n partitions: r0, r1, ..., r n-1
s is partitioned into m partitions: s0, s1, ..., sm-1
 use any partitioning technique
 there must be at least m * n processors
 label the processors as
P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1
 ri is replicated to Pi,0, Pi,1, ..., Pi,m-1,
 si is replicated to P0,i, P1,i, ..., Pn-1,i
computes the join of ri with sj
 Pi,j
 Any join technique can be used at each processor Pi,j

46 Advanced Database System Principle


Fragment-and-Replicate Join
 Special case: asymmetric fragment-and-replicate:
 One relation, r, is partitioned;
 The other relation, s, is replicated across all the
processors;
 Pi locally computes the join of ri with all s using any
join technique

47 Advanced Database System Principle


Fragment-and-Replicate Join (Cont.)
 Both versions of fragment-and-replicate fit for any join

 Usually has a higher cost than partitioning


 At least one relation (for asymmetric version) is replicated

 Sometimes asymmetric fragment-and-replicate is preferable


 E.g., s is small, r is large, and is already partitioned.
 It may be cheaper to replicate s to P, rather than repartition r
and s on the join attributes.

48 Advanced Database System Principle


Partitioned Parallel Hash-Join
Parallelizing partitioned hash join:
 Assume s is smaller, then s is chosen as the build relation

 hash function h1 maps tuples in s to one of the processors

 Each Pi reads tuples of si and sends each tuple to the


processor decided by hash function h1.

 On the processors, tuples si are further partitioned by hash


function h2, to compute the hash-join locally (Cont.)

49 Advanced Database System Principle


Partitioned Parallel Hash-Join (Cont.)
 Then relation r is redistributed across m processors by h1
 Let ri denote the tuples of relation r, sent to Pi.
 After ri is received by Pi, repartition ri using h2

 Each Pi executes the hash-join algorithm on partitions ri , si,


to produce part of the final results.

 Note: Hash-join on each processor is independent

50 Advanced Database System Principle


Parallel Index Nested-Loop Join
 Assume that
 relation s << r, and r is stored in partitions
 there is an index, on join attribute of relation r

 Use asymmetric fragment-and-replicate


 replicate s, use the existing partitioning of r

 Pjreads tuples of sj, and replicates to every other Pi.


 s is replicated at all sites, that store tuples of r

 Piperforms an indexed nested-loop join of relation s with


the ith partition of relation r

51 Advanced Database System Principle


Other Relational Operations
Selection (r)
 If  takes the form ai = v
 If r is partitioned on ai , the selection is performed at a single
processor.

 If  takes the form l <= ai <= u (range selection)


 If the relation has been range-partitioned on ai
 Selection is performed at each processor whose partition
overlaps with the specified range of values.

 In all other cases: the selection is performed in parallel at


all processors.

52 Advanced Database System Principle


Other Relational Operations (Cont.)
 Duplicate elimination
 In any parallel sort technique
 eliminate duplicates ASA found during sorting
 Can also partition the tuples (range- or hash-
partitioning)
 Then perform duplicate elimination locally

 Projection

53 Advanced Database System Principle


Grouping/Aggregation
 Step1: Partition the relation, on the grouping attributes,
 Step2: Compute the aggregate values locally, at each P

 Can reduce cost of transferring tuples


 by partly computing aggregate values before partitioning

54 Advanced Database System Principle


Grouping/Aggregation
 E.g., sum aggregation operation:
 Perform sum at each Pi , on disk Di , stores tuples
 Get tuples with partial sum value at each P
 Then, local sum values are partitioned on grouping
attribute
 and sum is performed again at each Pi to get final result
 Fewer tuples need to be transferred
 Save bandwidth as well as the transferring time

55 Advanced Database System Principle


Cost of Parallel Evaluation of Operations
 If no skew: expected speed-up =1/n

 If skew, the time can be estimated as


Tpart + Tasm + max (T0, T1, …, Tn-1)
 Tpart is the time for partitioning the relations
 Tasm is the time for assembling the results
 Ti is the time taken for the operation at processor Pi
 decided by the slowest one, need consider many factor

56 Advanced Database System Principle


Content
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism

57 Advanced Database System Principle


Interoperation Parallelism
 Pipelined parallelism,Independent parallelism
 Pipelined parallelism
 Consider a join of four relations
 r1 r2 r3 r4
 Set up a pipeline that computes the three joins in parallel
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned: temp2 = temp1 r3
 And P3 be assigned: temp3 = temp2 r4

 Each of these operations can execute in parallel,


sending results to the next operation

58 Advanced Database System Principle


Factors Limiting Utility of
Pipeline Parallelism
 Pipeline parallelism is useful since it avoids writing
intermediate results to disk

 Useful with a small number of processors, but does not


scale up well with more processors.
 One reason: pipeline chains do not attain sufficient length.

 Cannot pipeline operators which do not produce output until


all inputs have been accessed (e.g. aggregate and sort)

 Little speedup when one operator's execution cost is much


higher than the others.

59 Advanced Database System Principle


Independent Parallelism
 Independent parallelism
 Consider a join of four relations
r1 r2 r3 r4
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned: temp2 = r3 r4
 And P3 be assigned: temp1 temp2
 P1 and P2 can work independently in parallel
 P3 has to wait for input from P1 and P2
 Can pipeline output of P1 and P2 to P3, combining
independent parallelism and pipelined parallelism

 Does not provide a high degree of parallelism


 useful with a lower degree of parallelism.
 less useful in a highly parallel system,

60 Advanced Database System Principle


Query Optimization
 Query optimization in parallel databases
 is much more complex than in conventional databases

 Reason: cost models are more complicated, consider:


 partitioning costs, skew, resource competition

 When scheduling execution tree in parallel system, must


decide:
 How to parallelize each operation,
 how many processors,
 What operations to pipeline,
 what operations to execute independently in parallel,
 What operations to execute sequentially.

61 Advanced Database System Principle


Query Optimization (Cont.)
 Determining the amount of resources to allocate for each
operation is a problem.
 E.g., allocating more processors than optimal can result
in high communication overhead.

 Long pipelines should be avoided as the final operation


may wait a lot for inputs, while holding previous resources.

62 Advanced Database System Principle


Query Optimization (Cont.)
 The number of parallel evaluation plans is much larger
 than the number of sequential evaluation plans
 therefore heuristics are needed while optimization

 Two alternative heuristics for choosing parallel plans:


 do not use pipelining
 parallelize every operation across all processors

63 Advanced Database System Principle


Query Optimization (Cont.)
 Two alternative heuristics for choosing parallel plans:
 First, choose the most efficient sequential plan
 Then, parallelize the operations in that plan

64 Advanced Database System Principle


End of Chapter

You might also like