0% found this document useful (0 votes)

17 views4 pages

Example On Data Partision

The document compares three partitioning schemes (Block, Cyclic, and Block-Cyclic) for distributing workload among four processors, using an array of 16 elements with increasing costs. Cyclic partitioning provides the best load balance, while Block performs the worst due to grouping heavier elements together. Block-Cyclic offers a compromise between locality and balance, with its effectiveness depending on the chosen block size.

Uploaded by

ryandec69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Example On Data Partision

Uploaded by

ryandec69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Problem statement

1. You have an array of N = 16 elements, indexed 1..16. There are P = 4 processors: P0,
P1, P2, P3.
Each element i requires cost(i) = i units of work (so later indices are heavier).

Compare these three partitioning schemes: Block, Cyclic, and Block-Cyclic (block size B =
2). For each scheme:

1. List which elements each processor receives.

2. Compute the total work (sum of element indices) assigned to each processor (show
arithmetic).
3. Compute average work per processor and an imbalance metric max_load /
average_load.
4. Conclude which scheme is best for this workload.

Solution:

Solution — step by step

Total work (for verification)

Sum of integers 1 through 16:

1 + 2 + 3 + … + 16 = (16 × 17) / 2 = 136.
Average per processor = 136 / 4 = 34.

A. Block partitioning

Rule: divide into contiguous chunks of size N/P = 16/4 = 4.

Assignments:

 P0: indices 1, 2, 3, 4
 P1: indices 5, 6, 7, 8
 P2: indices 9, 10, 11, 12
 P3: indices 13, 14, 15, 16

Compute totals (showing arithmetic):

 P0 total = 1 + 2 + 3 + 4
= (1 + 2) + (3 + 4) = 3 + 7 = 10
 P1 total = 5 + 6 + 7 + 8
= (5 + 6) + (7 + 8) = 11 + 15 = 26
 P2 total = 9 + 10 + 11 + 12
= (9 + 10) + (11 + 12) = 19 + 23 = 42
 P3 total = 13 + 14 + 15 + 16
= (13 + 14) + (15 + 16) = 27 + 31 = 58

Check: 10 + 26 + 42 + 58 = (10 + 26) + (42 + 58) = 36 + 100 = 136 (matches total).

Imbalance metric:

 max_load = 58, average = 34

 ratio = 58 / 34 = divide step-by-step:
o 34 × 1 = 34 → remainder 24
o 240 / 34 ≈ 7 → 34×7=238 → remainder 2 → so decimal ≈ 1.705882...
 So ratio ≈ 1.7059 (≈ 170.59% of average).
Interpretation: one processor (P3) does ~70.6% more work than average → poor for
this skewed-cost case.

B. Cyclic partitioning

Rule: assign index i to processor (i − 1) mod 4 (round-robin).

Assignments:

 P0: 1, 5, 9, 13
 P1: 2, 6, 10, 14
 P2: 3, 7, 11, 15
 P3: 4, 8, 12, 16

Compute totals:

 P0 total = 1 + 5 + 9 + 13
= (1 + 5) + (9 + 13) = 6 + 22 = 28
 P1 total = 2 + 6 + 10 + 14
= (2 + 6) + (10 + 14) = 8 + 24 = 32
 P2 total = 3 + 7 + 11 + 15
= (3 + 7) + (11 + 15) = 10 + 26 = 36
 P3 total = 4 + 8 + 12 + 16
= (4 + 8) + (12 + 16) = 12 + 28 = 40

Check: 28 + 32 + 36 + 40 = (28 + 32) + (36 + 40) = 60 + 76 = 136 (matches total).

Imbalance metric:

 max_load = 40, average = 34

 ratio = 40 / 34 = divide step-by-step:
o 34 × 1 = 34 → remainder 6 → decimal 6/34 = 0.176470588...
 ratio ≈ 1.17647 (≈ 117.65% of average).
Interpretation: only ~17.65% more than average → good balance, but data locality is
poor (elements for a processor are scattered).

C. Block-Cyclic partitioning (block size B = 2)

Rule: break array into blocks of 2 contiguous elements, then assign blocks cyclically among
processors.
Owner of index i is ((i − 1) // B) mod P.

Blocks (B=2):
Block0 = indices (1,2), Block1 = (3,4), Block2 = (5,6), Block3 = (7,8), Block4 = (9,10),
Block5 = (11,12), Block6 = (13,14), Block7 = (15,16).

Assign blocks cyclically to P0..P3:

 Block0 → P0 (1,2)
 Block1 → P1 (3,4)
 Block2 → P2 (5,6)
 Block3 → P3 (7,8)
 Block4 → P0 (9,10)
 Block5 → P1 (11,12)
 Block6 → P2 (13,14)
 Block7 → P3 (15,16)

So processor elements:

 P0: 1, 2, 9, 10
 P1: 3, 4, 11, 12
 P2: 5, 6, 13, 14
 P3: 7, 8, 15, 16

Compute totals:

 P0 total = 1 + 2 + 9 + 10
= (1 + 2) + (9 + 10) = 3 + 19 = 22
 P1 total = 3 + 4 + 11 + 12
= (3 + 4) + (11 + 12) = 7 + 23 = 30
 P2 total = 5 + 6 + 13 + 14
= (5 + 6) + (13 + 14) = 11 + 27 = 38
 P3 total = 7 + 8 + 15 + 16
= (7 + 8) + (15 + 16) = 15 + 31 = 46

Check: 22 + 30 + 38 + 46 = (22 + 30) + (38 + 46) = 52 + 84 = 136.

Imbalance metric:

 max_load = 46, average = 34

 ratio = 46 / 34 = divide:
o 34 × 1 = 34 → remainder 12 → decimal 12/34 ≈ 0.352941176...
 ratio ≈ 1.35294 (≈ 135.29% of average).
Interpretation: better than pure block but worse than cyclic in this particular skewed-
cost pattern — but has better locality than cyclic.

Summary table (quick view)

Scheme P0 P1 P2 P3 max_load avg max/avg

Block 10 26 42 58 58 34 1.7059

Cyclic 28 32 36 40 40 34 1.1765

Block-Cyclic B=2 22 30 38 46 46 34 1.3529

Conclusion & recommendation

 For this specific workload where cost increases with index (later elements are
heavier), Cyclic gives the best load balance (lowest max/avg) because it spreads
heavy and light elements evenly across processors.
 Block performs worst here because it groups heavy elements together (P3 gets the
heaviest chunk).
 Block-Cyclic (B=2) is a compromise: better locality than Cyclic and better balance
than Block — useful when you want both locality and reasonable balance. Its
effectiveness depends on chosen block size B.
 In real HPC applications you choose B based on a tradeoff between
communication/locality and load balance; libraries like ScaLAPACK use block-cyclic
tiling for that reason.

Lookup Based Partitioned Scheduling
No ratings yet
Lookup Based Partitioned Scheduling
65 pages
CMS 217.docx..bak
No ratings yet
CMS 217.docx..bak
3 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
10 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
Computer Architecture, A Quantitative Approach (Solution For 5th Edition)
43% (7)
Computer Architecture, A Quantitative Approach (Solution For 5th Edition)
91 pages
Paper 2
No ratings yet
Paper 2
16 pages
Parallel Logic Synthesis Optimization For Digital Sequential Circuit
No ratings yet
Parallel Logic Synthesis Optimization For Digital Sequential Circuit
8 pages
18-Assignment 1 - Solution
No ratings yet
18-Assignment 1 - Solution
12 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
11 pages
Document
No ratings yet
Document
10 pages
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
No ratings yet
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
9 pages
Additional Optimizations For Parallel Squarer Units
No ratings yet
Additional Optimizations For Parallel Squarer Units
4 pages
COSS End Sem Paper
No ratings yet
COSS End Sem Paper
5 pages
Time Complexity Overview
No ratings yet
Time Complexity Overview
24 pages
Ejercicios 2
No ratings yet
Ejercicios 2
13 pages
Solution Manual For Computer Architecture A Quantitative Approach 5th Edition
No ratings yet
Solution Manual For Computer Architecture A Quantitative Approach 5th Edition
7 pages
PSO: An Approach To Multiobjective VLSI Partitioning: Atul Prakash Dr. R. K. Lal, Asso. Professor
No ratings yet
PSO: An Approach To Multiobjective VLSI Partitioning: Atul Prakash Dr. R. K. Lal, Asso. Professor
7 pages
Compre Makeup - Final
No ratings yet
Compre Makeup - Final
5 pages
Thesis
No ratings yet
Thesis
166 pages
Design &analysis of Algorithms Assignment: Uiet Department Information Technology Panjab University SSG-RC
No ratings yet
Design &analysis of Algorithms Assignment: Uiet Department Information Technology Panjab University SSG-RC
11 pages
Computer Architecture Cycle Test
No ratings yet
Computer Architecture Cycle Test
10 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
6 pages
Isro Os Full Prep Cse
No ratings yet
Isro Os Full Prep Cse
5 pages
CPU Performance and Power Analysis
No ratings yet
CPU Performance and Power Analysis
15 pages
Data Hazards and Pipeline Timing in RISC
No ratings yet
Data Hazards and Pipeline Timing in RISC
8 pages
Parallel 0/1 Knapsack Optimization
No ratings yet
Parallel 0/1 Knapsack Optimization
15 pages
Modified Hierarchical Load Balancing Algorithm For Scheduling in Grid Computing (Economic & Time Constraint)
No ratings yet
Modified Hierarchical Load Balancing Algorithm For Scheduling in Grid Computing (Economic & Time Constraint)
8 pages
C73 Exp 7
No ratings yet
C73 Exp 7
8 pages
Os Practical
No ratings yet
Os Practical
17 pages
S23 PDC Mid Exam
No ratings yet
S23 PDC Mid Exam
2 pages
High Level Synthesis II: ECE 3401 Digital Systems Design
No ratings yet
High Level Synthesis II: ECE 3401 Digital Systems Design
35 pages
15IF11 Multicore E PDF
No ratings yet
15IF11 Multicore E PDF
14 pages
Midterm Sol
No ratings yet
Midterm Sol
7 pages
Lecture 8
No ratings yet
Lecture 8
18 pages
IO Efficient Generation of Hyperbolic Random Graphs
No ratings yet
IO Efficient Generation of Hyperbolic Random Graphs
109 pages
Nmam Institute of Technology: Department of Computer Science and Engineering
No ratings yet
Nmam Institute of Technology: Department of Computer Science and Engineering
8 pages
Chap.4 - Summary Problems
No ratings yet
Chap.4 - Summary Problems
7 pages
Optimization Detailed
No ratings yet
Optimization Detailed
4 pages
Bos Lab M
No ratings yet
Bos Lab M
22 pages
PP 1
No ratings yet
PP 1
41 pages
Com - 612 Exam
No ratings yet
Com - 612 Exam
13 pages
Disk Less
No ratings yet
Disk Less
34 pages
Cai Nat
No ratings yet
Cai Nat
25 pages
Chapter 01
100% (1)
Chapter 01
6 pages
Practical No13
No ratings yet
Practical No13
10 pages
My Lecture6 Partitioning
No ratings yet
My Lecture6 Partitioning
38 pages
Ael Zg626 Ec-3r First Sem 2023-2024
No ratings yet
Ael Zg626 Ec-3r First Sem 2023-2024
5 pages
CSE323 Fall-2024
No ratings yet
CSE323 Fall-2024
2 pages
CPU Scheduling Algorithmsreport OS
No ratings yet
CPU Scheduling Algorithmsreport OS
32 pages
Parallel DFS and BFS
No ratings yet
Parallel DFS and BFS
35 pages
CP Unit 5
No ratings yet
CP Unit 5
18 pages
C Programming: Functions and Arrays Guide
No ratings yet
C Programming: Functions and Arrays Guide
41 pages
Row Major-Column Major
No ratings yet
Row Major-Column Major
4 pages
Programming in C - CS3251 - HandWritten Notes
No ratings yet
Programming in C - CS3251 - HandWritten Notes
21 pages
C Unit 3 Notes
No ratings yet
C Unit 3 Notes
6 pages
U3-L2 Bubble Sort
No ratings yet
U3-L2 Bubble Sort
14 pages
Accenture Questions and Interview Experience
No ratings yet
Accenture Questions and Interview Experience
41 pages
C++ Data Structures Guide
No ratings yet
C++ Data Structures Guide
6 pages
31768h Unit1 Rms 20200318
No ratings yet
31768h Unit1 Rms 20200318
19 pages
03 - Dynamic Arrays and Linked Lists
No ratings yet
03 - Dynamic Arrays and Linked Lists
11 pages
PROMYS Europe 2025 Application Problem Set - J25
No ratings yet
PROMYS Europe 2025 Application Problem Set - J25
3 pages
Cengage Reviewer PDF
No ratings yet
Cengage Reviewer PDF
31 pages
I-Year, B.Tech - Common To All Branch Programming For Problem Solving
No ratings yet
I-Year, B.Tech - Common To All Branch Programming For Problem Solving
2 pages
Ch-4 Array and Pointers
No ratings yet
Ch-4 Array and Pointers
15 pages
Grind 75 DSA Template
No ratings yet
Grind 75 DSA Template
56 pages
E R Diagram - Edited
No ratings yet
E R Diagram - Edited
6 pages
BCA Data Structures Guide
No ratings yet
BCA Data Structures Guide
26 pages
CS3251 Programming in C - IQ 1 - by WWW - Notesfree.in
100% (1)
CS3251 Programming in C - IQ 1 - by WWW - Notesfree.in
7 pages
Experiment 1
No ratings yet
Experiment 1
6 pages
C Programming Basics Guide
No ratings yet
C Programming Basics Guide
62 pages
Module 2 (Data Types)
No ratings yet
Module 2 (Data Types)
97 pages
DS Question Bank
No ratings yet
DS Question Bank
6 pages
Data Structure & Algorithm
No ratings yet
Data Structure & Algorithm
4 pages
C Function Pointers on ARM Cortex-M3
No ratings yet
C Function Pointers on ARM Cortex-M3
5 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
C Question Bank
No ratings yet
C Question Bank
13 pages
CTSD-Lab Mannual Final - 241204 - 102238
No ratings yet
CTSD-Lab Mannual Final - 241204 - 102238
54 pages
Data Structures Lab Manual 18CSL38
No ratings yet
Data Structures Lab Manual 18CSL38
72 pages
(Ebook PDF) Starting Out With Java: From Control Structures Through Data Structures 4th Edition Download
50% (2)
(Ebook PDF) Starting Out With Java: From Control Structures Through Data Structures 4th Edition Download
51 pages
Dalvik Opcodes Java
No ratings yet
Dalvik Opcodes Java
21 pages

Example On Data Partision

Uploaded by

Example On Data Partision

Uploaded by

Problem statement

1. List which elements each processor receives.

Solution — step by step

Sum of integers 1 through 16:

Rule: divide into contiguous chunks of size N/P = 16/4 = 4.

Compute totals (showing arithmetic):

Check: 10 + 26 + 42 + 58 = (10 + 26) + (42 + 58) = 36 + 100 = 136 (matches total).

 max_load = 58, average = 34

Rule: assign index i to processor (i − 1) mod 4 (round-robin).

Check: 28 + 32 + 36 + 40 = (28 + 32) + (36 + 40) = 60 + 76 = 136 (matches total).

 max_load = 40, average = 34

C. Block-Cyclic partitioning (block size B = 2)

Assign blocks cyclically to P0..P3:

Check: 22 + 30 + 38 + 46 = (22 + 30) + (38 + 46) = 52 + 84 = 136.

 max_load = 46, average = 34

Summary table (quick view)

Block-Cyclic B=2 22 30 38 46 46 34 1.3529

Conclusion & recommendation

You might also like