0% found this document useful (0 votes)
40 views12 pages

2.2 Programming

The document discusses several topics related to parallel programming performance including identifying and managing concurrency, task granularity, synchronization costs, and architectural support for parallel programs. It notes that managing concurrency dynamically allows programs to adapt but increases overhead costs, while task granularity is a tradeoff between workload balance and communication overhead. Synchronization that is too coarse can hurt load balancing. Overall architectural support can help with communication, naming data for task stealing, and efficient synchronization primitives.

Uploaded by

narendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views12 pages

2.2 Programming

The document discusses several topics related to parallel programming performance including identifying and managing concurrency, task granularity, synchronization costs, and architectural support for parallel programs. It notes that managing concurrency dynamically allows programs to adapt but increases overhead costs, while task granularity is a tradeoff between workload balance and communication overhead. Synchronization that is too coarse can hurt load balancing. Overall architectural support can help with communication, naming data for task stealing, and efficient synchronization primitives.

Uploaded by

narendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Outline

Applications
Creating Parallel Programs
Programming for Performance
Scaling
Synchronization Basics

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

32

Programming for Performance


Partitioning, Granularity, Communication, etc.
Caches and Their Effects

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

33

Aside on Cost-Effective Computing


Isnt Speedup(P) < P inefficient?
If only throughput matters, use P computers instead?
But much of a computers cost is NOT in the
processor [Wood & Hill, IEEE Computer, Feb 95]
Let Costup(P) = Cost(P)/Cost(1)
Parallel computing is cost-effective if
Speedup(P) > Costup(P)
E.g., for SGI PowerChallenge w/ 500MB:
Costup(32) = 8.6

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

34

Where Do Programs Spend Time?


Sequential
Busy computing
Memory system stalls

Parallel

Busy computing
Stalled for local memory
Stalled for remote memory (communication)
Synchronizing (load imbalance and operations)
Overhead

Speedup (p) = time(1)/time(p)


Amdahls Law
Could even be superlinear  how??

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

35

Partitioning for Performance


Balance workload
Reduce time spent at synchronization

Reduce communication
Reduce extra work
Determining and managing good assignment

These goals are at odds with each other

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

36

Programming for Performance


Identifying concurrency
Managing concurrency
Static
Dynamic

Granularity of concurrency
Serialization and synchronization costs

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

37

Identifying Concurrency
Data parallelism
Same ops on different data items

Functional (control, task) parallelism


Pipeline

Impact on load balancing?


Functional is more difficult
Longer running tasks

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

38

Managing Concurrency
Static
Cannot adapt to changes

Dynamic

Can adapt
Cost of management increases
Self-scheduling (guided self-scheduling)
Centralized task queue
Contention
Distributed task queue
Can steal from other queues
Architecture: Name data associated with stolen task

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

39

Granularity of Concurrency
Granularity = Amount of work associated with task
Large tasks

+
+
+

Worse load balancing


Lower overhead
Less contention
Less communication

Small tasks
+

Better load balancing


More synchronization
More management overhead
Might have too much communication (affinity scheduling)

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

40

Impact of Synchronization and Serialization


Too coarse synchronization
Barriers instead of point-to-point synch
Poor load balancing

Too many synchronization operations


Lock each element of array
Costly operations

Coarse grain locking


Lock entire array
Serialize access to array

Architectural aspects
Cost of synchronization operation
Synchronization name space

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

41

Architectural Support for Dynamic Task Stealing


How can architecture help?
Communication
Support for transfer of small amount of data and mutual exclusion
Can make tasks smaller
Better load balance

Naming
Make it easy to name data associated with stolen task

Synchronization
Support point-to-point synchronization
Better load balancing

(C) 2010 Daniel J. Sorin from Adve,


Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

42

Reducing Inherent Communication


Communication required for parallel program
Communication to Computation Ratio
(bytes / time) or (bytes / instruction)

Affected by assignment (task  process)


Domain decomposition
Interact with neighbors in space
Good for simulation of physical

Communicated Values

P0 P1 P2 P3
P4 P5 P6 P7
P8 P9 P10 P11
P12 P13 P14 P15

Data Space
(C) 2010 Daniel J. Sorin from Adve,
Falsafi, Hill, Lebeck, Reinhardt, Singh

ECE 259 / CPS 221

43

You might also like