0% found this document useful (0 votes)
17 views20 pages

Lec6 - OpenMP (Function-Level Parallelism)

slides for openMP

Uploaded by

3ammarist0.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Lec6 - OpenMP (Function-Level Parallelism)

slides for openMP

Uploaded by

3ammarist0.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Function-Level Parallelism in

OpenMP
Instructor: Dr. Fahed Jubair
Computer Engineering Department
University of Jordan
Function-Level Parallelism

• Programs with function-level (aka task-level) parallelism


focuses on distributing parallel functions (i.e., tasks) on
threads

• OpenMP provides two directives that are commonly used


for writing task-parallel programs:
1. The SECTIONS directive We will focus on this directive
2. The TASK directive Beyond the scope of this class

© All Rights Reserved.


Quick Review
The SECTIONS Directive
• A SECTIONS directive specifies that all enclosed
section(s) of code are to be divided between threads such
that each section is executed once by a thread

© All Rights Reserved.


Challenges

• There are two primarily challenges that programmers need


to tackle when converting a sequential algorithm into a
task-parallel algorithm:
§ How to maximize the number of tasks that can run in
parallel?
§ How to assign these parallel tasks to threads such that
parallelism overhead is minimized?

© All Rights Reserved.


Algorithms With
Function-Level Parallelism

We will study two sorting algorithms that take


advantage of function-level parallelism:
• Merge Sort
• Quick Sort

© All Rights Reserved.


Merge Sort
• Split the unsorted list into n sublists, each containing one element
• Repeatedly merge sublists to produce new sorted sublists until
there is only 1 sublist remaining, which will be the sorted list

// A is the input unsorted list and B is the output sorted list


void merge_sort (int A[ ], int begin, int end, int B[ ])
{
if ( begin == end ) // if A has only one integer
return A[begin] ;
middle = (begin + end) / 2 ;
merge_sort ( A, begin, middle – 1, left) ; // sort left half [begin : middle - 1]
merge_sort ( A, middle, end, right); // sort right half [middle : end]
merge (left, right, B); // merge left and right sublists into B
}
© All Rights Reserved.
Merge Sort Example
Split Phase

99 6 86 15 58 35 86 4 0

99 6 86 15 58 35 86 4 0

99 6 86 15 58 35 86 4 0

99 6 86 15 58 35 86 4 0

4 0
© All Rights Reserved.
Merge Sort Example
Merge Phase
0 4 6 15 35 58 86 86 99

6 15 86 99 0 4 35 58 86

6 99 15 86 35 58 0 4 86

99 6 86 15 58 35 86 0 4

4 0
© All Rights Reserved.
Parallelism In Merge Sort
• Upon careful examine of the merge sort code, we can observe
that the two recursive calls of merge sort can run in parallel
void merge_sort (int A[ ], int begin, int end, int B[ ])
{
if ( begin == end )
return A[begin] ;
middle = (begin + end) / 2 ;
#pragma omp parallel sections num_threads(2)
{ /* start of parallel region */
#pragma omp section
merge_sort ( A, begin, middle – 1, left) ;
#pragma omp section Run in parallel
merge_sort ( A, middle, end, right);
} /* end of parallel region */
merge (left, right, B); © All Rights Reserved.
}
Hold On!

• The recursive call of merge_sort in


the previous slide will cause
parallel regions to be created within
each other!
• Does OpenMP even allow this? If
so, then how does the parallel
execution occur?

© All Rights Reserved.


Nested Parallelism
Nested Parallelism

• OpenMP parallel
Some OpenMP regions can
implementations be nested
support nested inside each other
parallelism
– A thread
• A thread withinaa team
within team ofofthreads maymay
threads fork spawning
spawn aa team
child team of
of threads
threads

The encountering thread creates a team composed of


itself and some additional (possibly zero) number of
Imagethreads
Source: (the encountering
John thread becomes
Mellor-Crummey’s the master
Lecture Notes
thread)
© All Rights Reserved.
More about Nested Parallelism

• It is possible to enable/disable nested parallelism in


OpenMP using the environment variable OMP_NESTED or
the library routine omp_set_nested( )
• If nested parallelism is enabled, when a thread on a team
within a parallel region encounters a new parallel construct,
an additional team of threads is forked off of it, and it
becomes their master
• If nested parallelism is disabled, when a thread on a team
within a parallel region encounters a new parallel construct,
execution continues on this additional single thread only
© All Rights Reserved.
Example on
Nested Parallel Regions

© All Rights Reserved.


Example on
Nested Parallel Regions (cont.)

$ gcc –fopenmp nested.c –o a.out


$ export OMP_NESTED = TRUE
$ ./a.out

© All Rights Reserved.


Remarks on Nested Parallelism

• Nesting parallel regions provides an immediate way to


allow more threads to participate in the computation
• However, nesting parallel regions can easily create too
many threads and oversubscribe the system
• In addition, creating nested parallel regions adds overhead
• To avoid the above problems, programmers usually try to
impose some discipline on the execution using appropriate
OpenMP environment variables
§ For example, “OMP_MAX_ACTIVE_LEVELS” environment
variable controls the maximum allowed number of active nested
parallel regions
© All Rights Reserved.
Quick Sort
Sorts an array of integers as follows:
• Pick an element, called a pivot, from the array
• Reorder the array so that all elements with values less than
the pivot come before the pivot, while all elements with
values greater than the pivot come after it (equal values can
go either way)
• Recursively apply the above steps to the sub-array of
elements with smaller values and separately to the sub-
array of elements with greater values
Source:
Wikipedia
© All Rights Reserved.
Quick Sort (cont.)
void quick_sort (int A[ ], int begin, int end) {
if ( begin < end ) {
p = partition (A, begin, end ) ;
quick_sort ( A, begin, p – 1 ) ;
quick_sort ( A, p + 1, end ) ;
}
}
int partition (int A[ ], int begin, int end) {
pivot = A[end] ; // let us pick the last element as our pivot
i = begin ;
for ( j = begin; j < end; j++ ) {
if ( A[ j ] <= pivot ) {
swap ( A[ j ] , A[ i ] ); // swap the content of A[ j ] and A[ i ]
i++ ;
}
}
swap ( A[ end ] , A[ i ] );
return i ; © All Rights Reserved.
}
Quick Sort Example

99 6 86 15 58 0 86 4 35

6 15 0 4 35 86 86 99 58

0 4 6 15 58 86 86 99

6 15 86 86 99

© All Rights Reserved.


Parallelism In Quick Sort
• Upon careful examine of the quick sort code, we can observe
that the two recursive calls of quick sort can run in parallel
void quick_sort (int A[ ], int begin, int end) {
if ( begin < end ) {
p = partition (A, begin, end ) ;
#pragma omp parallel sections num_threads(2)
{
#pragma omp section
quick_sort ( A, begin, p – 1 ) ;
#pragma omp section Run in parallel
quick_sort ( A, p + 1, end ) ;
}
}
} © All Rights Reserved.
Exercise
• Write a C function that returns the occurrences count of an
integer 𝑥 inside A, an array of randomly generated integers
between [0: 99]. The algorithm should work as follows:
1. Split the array A into two halves: left and right
2. Count 𝑥 occurences in the left sub-array
3. Count 𝑥 occurences in the right sub-array
4. The total count is the summation of the left and right counts
5. Do the above steps recursively

• Now, parallelize your code using OpenMP

© All Rights Reserved.

You might also like