openmp.md

Pyccel OpenMP usage

Using the Runtime Library Routines

OpenMP Runtime Library Routines for Pyccel work by importing the OpenMP routine needed from the pyccel.stdlib:

Please note that files using the OpenMP Runtime library routines will only work when compiled with Pyccel (i.e. they won't work in pure Python mode).

from pyccel.stdlib.internal.openmp import omp_set_num_threads

OpenMP pragmas are recognised by comments beginning with #$omp (additional spaces are permitted as Python formatting tools tend to enforce their presence).

Example

The following example shows how omp_set_num_threads is used to set the number of threads to 4 threads and how omp_get_num_threads is used to get the number of threads in the current team within a parallel region; omp_get_num_threads will return 4 threads.

def get_num_threads(n : int):
    from pyccel.stdlib.internal.openmp import omp_set_num_threads, omp_get_num_threads, omp_get_thread_num
    omp_set_num_threads(n)
    #$ omp parallel
    print("hello from thread number:", omp_get_thread_num())
    result = omp_get_num_threads()
    #$ omp end parallel
    return result
x = get_num_threads(4)
print(x)

Please note that the variable result is a shared variable; Pyccel considers all variables as shared unless you specify them as private using the private() clause.

The output of this program is (you may get different result because of threads running at the same time):

❯ pyccel compile omp_test.py --openmp
❯ ./prog_omp_test
hello from thread number: 0
hello from thread number: 2
hello from thread number: 1
hello from thread number: 3
4

Supported Routines

From the many routines defined in the OpenMP 5.1 Standard, Pyccel currently supports:

All thread team routines except omp_get_supported_active_levels
All thread affinity routines except omp_set_affinity_format, omp_get_affinity_format, omp_display_affinity, omp_capture_affinity
All tasking routines
All device information routines except omp_get_device_num
omp_get_num_teams
omp_get_team_num

Directives Usage on Pyccel

Pyccel uses the same clauses as OpenMP, you can refer to the references below for more information on how to use them:

OpenMP 5.1 API Specification (pdf)
OpenMP 5.1 API Specification (html) OpenMP 5.1 Syntax Reference Guide

Other references:

OpenMP Clauses

`parallel` Construct

Syntax of `parallel`

#$ omp parallel [clause[ [,] clause] ... ]
structured-block
#$ omp end parallel

Example

The following example shows how to use the #$ omp parallel pragma to create a team of 2 threads, each thread with its own private copy of the variables n.

from pyccel.stdlib.internal.openmp import omp_get_thread_num

#$ omp parallel private (n) num_threads(2)
n = omp_get_thread_num()
print("hello from thread:", n)
#$ omp end parallel

The output of this program is (you may get different result because of threads running at the same time):

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
hello from thread: 0
hello from thread: 1

`loop` Construct

Syntax of `loop`

#$ omp for [nowait] [clause[ [,] clause] ... ]
for-loops

Example

This example shows how we can use the #$ omp for pragma to specify the loop that we want to be executed in parallel; each iteration of the loop is executed by one of the threads in the team.
The reduction clause is used to deal with the data race, each thread has its own local copy of the reduction variable result, when the threads join together, all the local copies of the reduction variable are combined to the global shared variable.

result = 0
#$ omp parallel private(i) shared(result) num_threads(4)
#$ omp for reduction (+:result)
for i in range(0, 1337):
  result += i
#$ omp end parallel
print(result)

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
893116

`single` Construct

Syntax of `single`

#$ omp single [nowait] [clause[ [,] clause] ... ]
structured-block
#$ omp end single [end_clause[ [,] end_clause] ... ]

Example

This example shows how we can use the #$ omp single pragma to specify a section of code that must be run by a single available thread.

from pyccel.stdlib.internal.openmp import omp_set_num_threads, omp_get_num_threads, omp_get_thread_num
omp_set_num_threads(4)
#$ omp parallel
print("hello from thread number:", omp_get_thread_num())
#$ omp single
print("The best thread is number : ", omp_get_thread_num())
#$ omp end single
#$ omp end parallel

The output of this program is (you may get different result because of threads running at the same time):

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
hello from thread number:            1
The best thread is number :             1
hello from thread number:            2
hello from thread number:            3
hello from thread number:            0

`critical` Construct

Syntax of `critical`

#$ omp critical [(name) [ [,] hint (hint-expression)]]
structured-block
#$ omp end critical

Example

This example shows how #$ omp critical is used to specify the code which must be executed by one thread at a time.

sum = 0
#$ omp parallel num_threads(4) private(i) shared(sum)
#$ omp for
for i in range(0, 1337):
  #$ omp critical
  sum += i
  #$ omp end critical
#$ omp end parallel
print(sum)

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
893116

`barrier` Construct

Syntax of `barrier`

#$ omp barrier

Example

This example shows how #$ omp barrier is used to specify a point in the code where each thread must wait until all threads in the team arrive.

from numpy import zeros

n = 1337
arr = zeros((n))
arr_2 = zeros((n))
#$ omp parallel num_threads(4) private(i, j) shared(arr)

#$ omp for
for i in range(0, n):
  arr[i] = i
#$ omp barrier
#$ omp for
for j in range(0, n):
  arr_2[j] = arr[j] * 2

#$ omp end parallel
print(sum(arr_2))

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
1786232

`masked` Construct

Syntax of `masked`

#$ omp masked [ filter(integer-expression) ]
structured-block
#$ omp end masked

Example

The #$ omp masked pragma is used here to specify a structured block that is executed by a subset of the threads of the current team.

result = 0
#$ omp parallel num_threads(4)
#$ omp masked
result = result + 1
#$ omp end masked
#$ omp end parallel
print("result :", result)

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
result : 1

`taskloop`/`atomic` Construct

Syntax of `taskloop`

#$ omp taskloop [clause[ [,]clause] ... ]
for-loops

Syntax of `atomic`

#$ omp atomic [clause[ [,]clause] ... ]
structured-block
#$ omp end atomic

Example

The #$ omp taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using explicit tasks. The #$ omp atomic is used to ensure that a specific storage location is accessed atomically; which prevent the possibility of multiple, simultaneous reading and writing of threads.

from pyccel.stdlib.internal.openmp import omp_get_thread_num

x1 = 0
x2 = 0
#$ omp parallel shared(x1,x2) num_threads(2)

#$ omp taskloop
for i in range(0, 100):
  #$ omp atomic
  x1 = x1 + 1 #Will be executed (100 x 2) times.

#$ omp single
#$ omp taskloop
for i in range(0, 100):
  #$ omp atomic
  x2 = x2 + 1 #Will be executed (100) times.
#$ omp end single

#$ omp end parallel
print("x1 : ", x1);
print("x2 : ", x2);

The output of this program is (you may get a different output, but the sum must be the same for each thread):

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
x1 : 200
x2 : 100

`simd` Construct

Syntax of `simd`

#$ omp simd [clause[ [,]clause] ... ]
loop-nest

Example

The #$ omp simd pragma is used to transform the loop into a loop that will be executed concurrently using Single Instruction Multiple Data (SIMD) instructions.

from numpy import zeros
result = 0
n = 1337
arr = zeros(n, dtype=int)
#$ omp parallel num_threads(4)
#$ omp simd
for i in range(0, n):
  arr[i] = i
#$ omp end parallel
for i in range(0, n):
  result = result + arr[i]
print("Result:", result)

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
Result: 893116

`task` / `taskwait` Construct

Syntax of `task`

#$ omp task [clause[ [,]clause] ... ]
structured-block
#$ omp end task

Syntax `taskwait`

#$ omp taskwait

Example

The #$ omp task pragma is used here to define an explicit task.
The #$ omp taskwait pragma is used here to specify that the current task region remains suspended until all child tasks that it generated before the taskwait construct complete execution.

def fib(n : int) -> int:
  if n < 2:
    return n
  #$ omp task shared(i) firstprivate(n)
  i = fib(n-1)
  #$ omp end task
  #$ omp task shared(j) firstprivate(n)
  j = fib(n-2)
  #$ omp end task
  #$ omp taskwait
  return i+j

#$ omp parallel
#$ omp single
print(fib(10))
#$ omp end single
#$ omp end parallel

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
55

`taskyield` Construct

Syntax of `taskyield`

#$ omp taskyield

Example

The #$ omp taskyield pragma specifies that the current task can be suspended at this point, in favour of execution of a different task.

#$ omp task
long_function()
#$ omp taskyield
long_function_2()
#$ omp end task

`flush` Construct

Syntax of `flush`

#$ omp flush

Example

The #$ omp flush pragma is used to ensure that all threads in a team have a consistent view of certain objects in memory.

from pyccel.stdlib.internal.openmp import omp_get_thread_num
flag = 0
#$ omp parallel num_threads(2)
if omp_get_thread_num() == 0:
  #$ omp atomic update
  flag = flag + 1
elif omp_get_thread_num() == 1:
  #$ omp flush(flag)
  while flag < 1:
    #$ omp flush(flag)
  print("Thread 1 released")
  #$ omp atomic update
  flag = flag + 1
#$ omp end parallel
print("flag:", flag)

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
Thread 1 released
flag: 2

`cancel` Construct

Syntax of `cancel`

#$ omp cancel construct-type-clause[ [ , ] if-clause]

Example

The #$ omp cancel is used to request cancellation of the innermost enclosing region of the type specified.

import numpy as np
v = np.array([1, -5, 3, 4, 5])
result = 0
#$ omp parallel
#$ omp for private(i) reduction (+:result)
for i in range(len(v)):
  result = result + v[i]
  if result < 0:
    #$ omp cancel for
    pass
#$ omp end parallel

`teams`/`target`/`distribute` Constructs

Syntax of `target`

#$ omp target [clause[ [,]clause] ... ]
structured-block
#$ omp end target

Syntax of `teams`

#$ omp teams [clause[ [,]clause] ... ]
structured-block
#$ omp end teams

Syntax distribute

#$ omp distribute [clause[ [,]clause] ... ]
for-loops

Example

In this example we show how we can use the #$ omp target pragma to define a target region, which is a computational block that operates within a distinct data environment and is intended to be offloaded onto a parallel computation device during execution.
The #$ omp teams directive creates a collection of thread teams. The master thread of each team executes the teams region.
The #$ omp distribute directive specifies that the iterations of one or more loops will be executed by the thread teams in the context of their implicit tasks.

from numpy import zeros
from pyccel.stdlib.internal.openmp import omp_get_team_num
n = 8
threadlimit = 4
a = zeros(n, dtype=int)
#$ omp target
#$ omp teams num_teams(2) thread_limit(threadlimit)
#$ omp distribute
for i in range(0, n):
  a[i]    = omp_get_team_num()
#$ omp end teams
#$ omp end target

for i in range(0, n):
  print("Team num :", a[i])

The output of this program is:

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
Team num : 0
Team num : 0
Team num : 0
Team num : 0
Team num : 1
Team num : 1
Team num : 1
Team num : 1

`sections` Construct

Syntax of `sections`

#$ omp sections [nowait] [clause[ [,]clause] ... ]

#$ omp section
structured-block-sequence
#$ omp end section
#$ omp section
structured-block-sequence
#$ omp end section

#$ omp end sections

Example

The #$ omp sections directive is used to distribute work among threads (2 threads).

from pyccel.stdlib.internal.openmp import omp_get_thread_num

n = 8
sum1 = 0
sum2 = 0
sum3 = 0
#$ omp parallel num_threads(2)
#$ omp sections

#$ omp section
for i in range(0, int(n/3)):
  sum1 = sum1 + i
print("sum1 :", sum1, ", thread :", omp_get_thread_num())
#$ omp end section

#$ omp section
for i in range(0, int(n/2)):
  sum2 = sum2 + i
print("sum2 :", sum2, ", thread :", omp_get_thread_num())
#$ omp end section

#$ omp section
for i in range(0, n):
  sum3 = sum3 + i
print("sum3 :", sum3, ", thread :", omp_get_thread_num())
#$ omp end section

#$ omp end sections
#$ omp end parallel

The output of this program is :

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
sum1 : 1, thread : 0
sum2 : 6, thread : 0
sum3 : 28, thread : 1

Combined Constructs Usage on Pyccel

`parallel for`

Syntax of `parallel for`

#$ omp parallel for [clause[ [,]clause] ... ]
loop-nest

Example

The #$ omp parallel for construct specifies a parallel construct containing a work sharing loop construct with a canonical loop nest.

import numpy as np
x = np.array([2,5,4,3,2,5,7])
result = 0
#$ omp parallel for reduction (+:result)
for i in range(0, len(x)):
    result += x[i]
print("result:", result)

The output of this program is :

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
result: 28

`parallel for simd`

Syntax of `parallel for simd`

#$ omp parallel for simd [clause[ [,]clause] ... ]
loop-nest

Example

The #$ omp parallel for simd construct specifies a parallel construct containing only one work sharing loop SIMD construct.

import numpy as np
x = np.array([1,2,1,2,1,2,1,2])
y = np.array([2,1,2,1,2,1,2,1])
z = np.zeros(8, dtype = int)
result = 0
#$ omp parallel for simd
for i in range(0, 8):
    z[i] = x[i] + y[i]

for i in range(0, 8):
    print("z[",i,"] :", z[i])

The output of this program is :

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
z[ 0 ] : 3
z[ 1 ] : 3
z[ 2 ] : 3
z[ 3 ] : 3
z[ 4 ] : 3
z[ 5 ] : 3
z[ 6 ] : 3
z[ 7 ] : 3

`for simd`

Syntax of `for simd`

#$ omp for simd [clause[ [,]clause] ... ]
for-loops

`teams distribute`

Syntax of `teams distribute`

#$ omp teams distribute [clause[ [,]clause] ... ]
loop-nest

Example

import numpy as np
x = np.array([1,2,1,2,1,2,1,2])
y = np.array([2,1,2,1,2,1,2,1])
z = np.zeros(8, dtype = int)
result = 0
#$ omp parallel
#$ omp for simd
for i in range(0, 8):
    z[i] = x[i] + y[i]

#$ omp end parallel
for i in range(0, 8):
    print("z[",i,"] :", z[i])

The output of this program is :

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
z[ 0 ] : 3
z[ 1 ] : 3
z[ 2 ] : 3
z[ 3 ] : 3
z[ 4 ] : 3
z[ 5 ] : 3
z[ 6 ] : 3
z[ 7 ] : 3

`teams distribute simd`

Syntax of `teams distribute simd`

#$ omp teams distribute simd [clause[ [,]clause] ... ]
loop-nest

`teams distribute parallel for`

Syntax of `teams distribute parallel for`

#$ omp teams distribute parallel for [clause[ [,]clause] ... ]
loop-nest

`target parallel`

Syntax of `target parallel`

#$ omp target parallel [clause[ [,]clause] ... ]
structured-block
#$ omp end target parallel

`target parallel for`

Syntax of `target parallel for`

#$ omp target parallel for [clause[ [,]clause] ... ]
loop-nest

`target parallel for simd`

Syntax of `target parallel for simd`

#$ omp target parallel for simd [clause[ [,]clause] ... ]
loop-nest

`target teams`

Syntax of `target teams`

#$ omp target teams [clause[ [,]clause] ... ]
structured-block
#$ omp end target teams

`target teams distribute`

Syntax of `target teams distribute`

#$ omp target teams distribute [clause[ [,]clause] ... ]
loop-nest

`target teams distribute simd`

Syntax of `target teams distribute simd`

#$ omp target teams distribute simd [clause[ [,]clause] ... ]
loop-nest

`target teams distribute parallel for`

Syntax of `target teams distribute parallel for`

#$ omp target teams distribute parallel for [clause[ [,]clause] ... ]
loop-nest

`target teams distribute parallel for simd`

Syntax of `target teams distribute parallel for simd`

#$ omp target teams distribute parallel for simd [clause[ [,]clause] ... ]
loop-nest

Example

The #$ omp parallel for simd construct specifies a parallel construct containing only one work sharing loop SIMD construct.

r = 0
#$ omp target teams distribute parallel for reduction(+:r)
for i in range(0, 10000):
    r = r + i

print("result:",r)

The output of this program is :

❯ pyccel compile omp_test.py --openmp
❯ ./omp_test
result: 49995000

Supported Constructs

All constructs in the OpenMP 5.1 standard are supported except:

scope
workshare
scan
interop

FilesExpand file tree

openmp.md

Latest commit

History

openmp.md

File metadata and controls

Pyccel OpenMP usage

Using the Runtime Library Routines

Example

Supported Routines

Directives Usage on Pyccel

parallel Construct

Syntax of parallel

Example

loop Construct

Syntax of loop

Example

single Construct

Syntax of single

Example

critical Construct

Syntax of critical

Example

barrier Construct

Syntax of barrier

Example

masked Construct

Syntax of masked

Example

taskloop/atomic Construct

Syntax of taskloop

Syntax of atomic

Example

simd Construct

Syntax of simd

Example

task / taskwait Construct

Syntax of task

Syntax taskwait

Example

taskyield Construct

Syntax of taskyield

Example

flush Construct

Syntax of flush

Example

cancel Construct

Syntax of cancel

Example

teams/target/distribute Constructs

Syntax of target

Syntax of teams

Syntax distribute

Example

sections Construct

Syntax of sections

Example

Combined Constructs Usage on Pyccel

parallel for

Syntax of parallel for

Example

parallel for simd

Syntax of parallel for simd

Example

for simd

Syntax of for simd

teams distribute

Syntax of teams distribute

Example

teams distribute simd

Syntax of teams distribute simd

teams distribute parallel for

Syntax of teams distribute parallel for

target parallel

Syntax of target parallel

target parallel for

Syntax of target parallel for

target parallel for simd

Syntax of target parallel for simd

target teams

`parallel` Construct

Syntax of `parallel`

`loop` Construct

Syntax of `loop`

`single` Construct

Syntax of `single`

`critical` Construct

Syntax of `critical`

`barrier` Construct

Syntax of `barrier`

`masked` Construct

Syntax of `masked`

`taskloop`/`atomic` Construct

Syntax of `taskloop`

Syntax of `atomic`

`simd` Construct

Syntax of `simd`

`task` / `taskwait` Construct

Syntax of `task`

Syntax `taskwait`

`taskyield` Construct

Syntax of `taskyield`

`flush` Construct

Syntax of `flush`

`cancel` Construct

Syntax of `cancel`

`teams`/`target`/`distribute` Constructs

Syntax of `target`

Syntax of `teams`

`sections` Construct

Syntax of `sections`

`parallel for`

Syntax of `parallel for`

`parallel for simd`

Syntax of `parallel for simd`

`for simd`

Syntax of `for simd`

`teams distribute`

Syntax of `teams distribute`

`teams distribute simd`

Syntax of `teams distribute simd`

`teams distribute parallel for`

Syntax of `teams distribute parallel for`

`target parallel`

Syntax of `target parallel`

`target parallel for`

Syntax of `target parallel for`

`target parallel for simd`

Syntax of `target parallel for simd`

`target teams`

Syntax of `target teams`

`target teams distribute`

Syntax of `target teams distribute`

`target teams distribute simd`

Syntax of `target teams distribute simd`

`target teams distribute parallel for`

Syntax of `target teams distribute parallel for`

`target teams distribute parallel for simd`

Syntax of `target teams distribute parallel for simd`