0% found this document useful (0 votes)
26 views24 pages

OMP Exec

The document provides an overview of the OpenMP programming and execution model, detailing its execution and memory models, race conditions, and parallel constructs. It explains how threads are created, managed, and how data is shared or kept private among them, along with various clauses and runtime functions. Additionally, it highlights the importance of correctly defining variable scopes to ensure thread safety in parallel programming.

Uploaded by

Shruthi Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views24 pages

OMP Exec

The document provides an overview of the OpenMP programming and execution model, detailing its execution and memory models, race conditions, and parallel constructs. It explains how threads are created, managed, and how data is shared or kept private among them, along with various clauses and runtime functions. Additionally, it highlights the importance of correctly defining variable scopes to ensure thread safety in parallel programming.

Uploaded by

Shruthi Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

OpenMP programming and

execution model
SoHPC, 2021
Outline

▪ Execution Model
▪ Memory Model
▪ Race Condition
▪ Parallel Construct
▪ Hello World
▪ If Clause
▪ Dynamic and Nested Regions
▪ Data Clauses
▪ Number of Threads
▪ Practical
2
Execution Model
• Thread-based Parallelism
o Initially there is the master thread, at a designated point multiple threads are created,
a parallel region
• Compiler Directive Based
o Directives tell the compiler where are these parallel regions
o Means minimal and incremental changes needed to sequential code
• Explicit Parallelism
• Fork-Join Model

3
Execution Model

• Dynamic Threads
o More than one parallel region
o Different number of threads

• Nested Parallelism
o Parallel region inside another parallel region.

4
Memory model
• All threads have access to the shared memory.
• Rule of thumb: One thread per core (or processor)
• Cache is private to each core/thread.
• Maintaining a consistent view of main memory within the caches is called cache
coherency.

CPU CPU CPU CPU

Cache Cache Cache Cache

Main
Memory I/O System

5
Memory model
• Threads can share data with other threads, but also have
private data.
Thread 1 Thread 2 Thread 3

CPU Private data CPU Private data CPU Private data

Shared data

6
Four different parts of the
memory:
• Code area
• Globals area
• Heap
• Stack

Heap: large pool of memory,


deallocation, shared by all
threads

Stack: each thread has its


own, store private data, LIFO
principle, fast, no need for
deallocation like heap.

7
Race Condition
• Threads communicate through shared variables.
Uncoordinated access of these variables can lead to
undesired effects.
• two threads update (write) a shared variable in the
same step of execution, the result is dependent on
the way this variable is accessed. This is called a
race condition.
• Suppose that one processor has an updated result
in private cache. Second processor wants to access
that memory location - but a read from memory will
get the old value since original data not yet written
back.
• Can be time consuming; better first to change how data is
accessed
8
Parallel Constructs
• The fundamental construct in
OpenMP.
• Creates team of threads
• Every thread executes the same
statements inside the parallel
region at the end of the parallel
region there is an implicit barrier
C/C++: Fortran:
!$omp parallel [clauses]
#pragma omp parallel [clauses]

{
!$omp end parallel

}

9
double A[1000];
Create a 4-thread parallel region omp_set_num_threads(4);
#pragma omp parallel
{
int tid=omp_get_thread_num();
foo(tid,A);
double A[1000]; }
Tid: from 0 to 3
printf(“All Done\n”);
Each calls foo(tid, A)
omp_set_num_threads(4);

foo(0,A); foo(1,A); foo(2,A); foo(3,A);


Threads wait for all treads to finish
before proceeding

printf(“All Done\n”);
10
Parallel Construct
• Clauses:
num_threads (integer-expression)
if (scalar_expression)
private (list)
shared (list)
default (shared | none)
firstprivate (list)
reduction (operator: list)
copyin (list)

11
Hello World

C - Serial: C:
#include<stdio.h> #include<stdio.h>
#include<omp.h>
int main(int argc, char**argv){ int main(int argc, char**argv){
#pragma omp parallel
printf("Hello world!\n”); printf("Hello from thread %d out of %d\n",
omp_get_thread_num(),
omp_get_num_threads());
} }

12
Hello World

Fortran - Serial: Fortran:


program hello program hello
use omp_lib
implicit none implicit none

!$omp parallel
print *, 'Hello world!’ print *, 'Hello from thread', &
omp_get_thread_num(), &
'out of’,omp_get_num_threads()
!$omp end parallel

end program hello end program hello

13
If Clause
If Clause:
• Used to make the parallel region directive itself conditional.
• Only execute in parallel if expression is true.
(Checks the size of the data)

C/C++:
Fortran:
#pragma omp parallel if(n>100)
{ !$omp parallel if(n>100)
… ...
} !$omp end parallel

14
Dynamic Threads
Dynamic threads:
• Used to create parallel regions with a variable number of threads
• OpenMP runtime will decide the number of threads
• omp_set_dynamic(), OMP_DYNAMIC, omp_get_dynamic()
omp_set_dynamic(0);
omp_set_num_threads(10);
#pragma omp parallel
printf("Num threads in non-dynamic region is = %d\n",
omp_get_num_threads());

omp_set_dynamic(1);
omp_set_num_threads(10);
#pragma omp parallel
printf("Num threads in dynamic region is = %d\n", omp_get_num_threads());

15
Nested Regions
Nested parallel regions:
• If a parallel directive is encountered
within another parallel directive, a
new team of threads will be created.
• omp_set_nested(), OMP_NESTED,
omp_get_nested()
• Num threads affects the new regions
• New threads with one thread unless
nested parallelism is enabled
• num_threads(n) clause or dynamic
threading for different num threads

16
Data Clauses
• Used in conjunction with several directives to control the
scoping of enclosed variables.
– default(shared|none): The default scope for all of the variables;
Fortran has more options.
– shared(list): Variable is shared by all threads in the team. All threads
can read or write to that variable.
C/C++: #pragma omp parallel default(none) shared(n)
Fortran: !$omp parallel default(none) shared(n)

– private(list): Each thread has a private copy of variable. It can only


be read or written by its own thread.
C/C++: #pragma omp parallel default(shared) private(tid)
Fortran: !$omp parallel default(shared) private(tid)

17
Example
C: Fortran:
#include<stdio.h> program hello
#include<omp.h> use omp_lib
int main(){ implicit none
int tid, nthreads; integer tid, nthreads

#pragma omp parallel private(tid), shared(nthreads) !$omp parallel private(tid), shared(nthreads)


{ tid=omp_get_thread_num()
tid=omp_get_thread_num(); nthreads=omp_get_num_threads()
nthreads=omp_get_num_threads(); print*, 'Hello from thread',tid,'out of',nthreads
printf("Hello from thread %d out of %d\n", tid, !$omp end parallel
nthreads);
} end program hello
}

18
• How do we decide which variables should be shared
and which private?
– Loop indices - private
– Loop temporaries - private
– Read-only variables - shared
– Main arrays - shared
• Most variables are shared by default
– C/C++: File scope, static variables
– Fortran: COMMON blocks, SAVE, MODULE
variables
– Both: dynamically allocated variables

• Variables declared in parallel region are always


private 19
Additional Data Clauses
– firstprivate(list): pre-initialize
private vars with value of j = jstart;
variable with same name #pragma omp parallel shared(arr), firstprivate(j)
before parallel construct. {
Int tid = omp_get_thread_num()
arr[tid] = tid+j;
– lastprivate(list): On exiting the }
for (int i=0; i<nthreads; i++) printf("%d, %d\n",i,arr[i]);
parallel region, this gives
private data the value of last
iteration if sequential)
#pragma omp parallel copyin(jstart)
– threadprivate(list): Used to {
make global file scope int tid = omp_get_thread_num();
variables (C/C++) or common jstart = jstart + tid + 1;
printf("%d, %d\n",tid,jstart);
blocks (Fortran) private to }
thread.
printf("%d\n",jstart);{

– copyin(list): Copies the


threadprivate variables from
master thread to the team
20
threads.
Runtime Functions
• Runtime Functions: for managing the parallel program
dynamically.

– omp_set_num_threads(n) - set the desired number of


threads
– omp_get_num_threads() - returns the current number
of threads
– omp_get_thread_num() - returns the id of this thread
– omp_in_parallel() – returns .true. if inside parallel
region

C/C++: Add #include<omp.h>


Fortran: Add use omp_lib
21
Shell Variables

• Environment Variables: for controlling the


execution of parallel program at run-time.

– csh/tcsh: setenv OMP_NUM_THREADS n


– ksh/sh/bash: export OMP_NUM_THREADS=n
echo $OMP_NUM_THREADS

22
How many threads?
The number of threads in a parallel region is determined by:

▪ Setting of the OMP_NUM_THREADS environment


variable.
▪ Use of the omp_set_num_threads(n) library function.
▪ Use of num_threads(n) clause.
▪ The implementation default - usually the number of
CPUs/cores on a node

Threads are numbered from 0 (master thread) to n-1 where


n=the total number of threads.

23
Summary
• Parallel construct forks threads.
• There are several ways to determine the number of threads
per region.
• We can have dynamic and nested parallel regions.
• Variables must be defined as private or shared.
• One of the common problems is not declaring them properly.
This will lead to different results for different numbers of
threads.
• Program is said to be thread safe, if results are the same for
any number of threads.

24

You might also like