Parallel Distributed Computing using Python
Lisandro Dalcin
dalcinl@[Link]
Joint work with
Pablo Kler Rodrigo Paz
Mario Storti Jorge D’Elı́a
Consejo Nacional de Investigaciones Cientı́ficas y Técnicas (CONICET)
Instituto de Desarrollo Tecnológico para la Industria Quı́mica (INTEC)
Centro Internacional de Métodos Computacionales en Ingenierı́a (CIMEC)
[Link]
HPCLatAm 2011
Córdoba, Argentina
September 1, 2011
Outline
Overview
MPI for Python
PETSc for Python
Applications
Overview
MPI for Python
PETSc for Python
Applications
Motivation
◮ Apply numerical methods in complex problems of medium to
large scale in science and engineering
◮ multiphysic nature
◮ strong/loose coupling
◮ multiple interacting scales
◮ Ease the access to computing resources in distributed memory
architectures (clusters, supercomputers)
◮ beginners
◮ scientists and engineers
◮ experienced software developers
Objectives
◮ Develop extension modules for the Python programming
language providing access to MPI and PETSc libraries
◮ message passing
◮ parallel linear algebra
◮ linear and nonlinear solvers
◮ Perform computer-based simulations
◮ problems modeled by partial differential equations (PDEs)
◮ problems related to computational fluid mechanics (CFD)
Why Python?
◮ very clear, readable syntax
◮ natural expression of procedural code
◮ very high level dynamic data types
◮ intuitive object orientation
◮ exception-based error handling
◮ full modularity, hierarchical packages
◮ comprehensive standard library
◮ extensible with C and C++
◮ embeddable within applications
Python for Scientific Computing
◮ Scientific computing (and particularly HPC) has been
traditionally dominated by C, C++, y Fortran
◮ High level and general purpouse computing environments
(Maple, Mathematica, MATLAB) got popular since the 90’s
◮ Python is becoming increasingly popular in the scientific
community since the 2000’s
◮ Key feature: easy to extend with C, C++, Fortran
◮ NumPy ◮ Cython
◮ SciPy ◮ SWIG
◮ SymPy ◮ F2Py
What is MPI?
Message Passing Interface
[Link]
◮ Standardized message passing system
◮ platform-independent (POSIX, Windows)
◮ many implementations and vendors
◮ MPICH2, Open MPI
◮ HP, Intel, Oracle, Microsoft
◮ Specifies semantics of a set of library routines
◮ No special compiler support
◮ Language-neutral (C, C++, Fortran 90)
◮ Standard versions (backward-compatible)
◮ MPI-1 (1994, 2008)
◮ MPI-2 (1996, 2009)
◮ MPI-3 (under development)
What is PETSc?
Portable, Extensible Toolkit for Scientific Computation
[Link]
◮ PETSc is a suite of algorithms and data structures
for the numerical solution of
◮ problems in science and engineering
◮ based on partial differential equations models
◮ discretized with finite differences/volumes/elements
◮ leading to large scale applications
◮ PETSc employs the MPI standard for parallelism
◮ PETSc has an OO design, it is implemented in C,
can be used from C++ , provides a Fortran 90 interface
Overview
MPI for Python
PETSc for Python
Applications
MPI for Python (mpi4py)
◮ Python bindings for MPI
◮ API based on the standard MPI-2 C++ bindings
◮ Supports all MPI features
◮ targeted to MPI-2 implementations
◮ also works with MPI-1 implementations
[mpi4py] Implementation
Implemented with Cython
◮ Code base far easier to write, maintain, and extend
◮ Faster than other solutions (mixed Python and C codes)
◮ A pythonic API that runs at C speed !
[mpi4py] Implementation – Cython [1]
cdef import from "mpi.h":
ctypedef void* MPI_Comm
MPI_Comm MPI_COMM_NULL
MPI_Comm MPI_COMM_SELF
MPI_Comm MPI_COMM_WORLD
int MPI_Comm_size(MPI_Comm,int*)
int MPI_Comm_rank(MPI_Comm,int*)
cdef inline int CHKERR(int ierr) except -1:
if ierr != 0:
raise RuntimeError("MPI error code %d" % ierr)
return 0
[mpi4py] Implementation – Cython [2]
cdef class Comm:
cdef MPI_Comm ob_mpi
...
def Get_size(self):
cdef int size
CHKERR( MPI_Comm_size(self.ob_mpi, &size) )
return size
def Get_rank(self):
cdef int rank
CHKERR( MPI_Comm_rank(self.ob_mpi, &rank) )
return rank
...
cdef inline Comm NewComm(MPI_Comm comm_c):
cdef Comm comm_py = Comm()
comm_py.ob_mpi = comm_c
return comm_py
COMM_NULL = NewComm(MPI_COMM_NULL)
COMM_SELF = NewComm(MPI_COMM_SELF)
COMM_WORLD = NewComm(MPI_COMM_WORLD)
[mpi4py] Features – MPI-1
◮ Process groups and communication domains
◮ intracommunicators
◮ intercommunicators
◮ Point to point communication
◮ blocking (send/recv)
◮ nonblocking (isend/irecv + test/wait)
◮ Collective operations
◮ Synchronization (barrier)
◮ Communication (broadcast, scatter/gather)
◮ Global reductions (reduce, scan)
[mpi4py] Features – MPI-2
◮ Extended collective operations.
◮ Dynamic process management (spawn, connect/accept)
◮ Parallel I/O (read/write)
◮ One sided operations, aka RMA (put/get/accumulate)
[mpi4py] Features – Communicating of Python objects
◮ High level and very convenient, based in pickle serialization
◮ Can be slow for large data (CPU and memory consuming)
At the sending side . . .
[Link](object) −→ [Link]() −→ MPI Send()
At the receiving side . . .
object = [Link]() ←− [Link]() ←− MPI Recv()
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
msg1 = [77, 3.14, 2+3j, "abc", (1,2,3,4)]
elif rank == 1:
msg1 = {"A": [2,"x",3], "B": (2.17,1+3j)}
wt = [Link]()
if rank == 0:
[Link](msg1, 1, tag=0)
msg2 = [Link](None, 1, tag=7)
elif rank == 1:
msg2 = [Link](None, 0, tag=0)
[Link](msg1, 0, tag=7)
wt = [Link]() - wt
[mpi4py] Features – Communicating array data
◮ Lower level, slightly more verbose
◮ Very fast, almost C speed (for messages above 5-10 KB)
At the sending side . . .
message = [object, (count, displ), datatype]
[Link](message)−→ MPI Send()
At the receiving side . . .
message = [object, (count, displ), datatype]
[Link](message)←− MPI Recv()
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
array1 = [Link](2**16, dtype=np.float64)
array2 = [Link](2**16, dtype=np.float64)
wt = [Link]()
if rank == 0:
[Link]([array1, [Link]], 1, tag=0)
[Link]([array2, [Link]], 1, tag=7)
elif rank == 1:
[Link]([array2, [Link]], 0, tag=0)
[Link]([array1, [Link]], 0, tag=7)
wt = [Link]() - wt
Point to Point Throughput – Gigabit Ethernet
PingPong
120
Pickle
100 Buffer
C
Throughput [MiB/s]
80
60
40
20
0
100 101 102 103 104 105 106 107
Array Size [Bytes]
Point to Point Throughput – Shared Memory
PingPong
4500
4000
Pickle
Buffer
3500 C
Throughput [MiB/s]
3000
2500
2000
1500
1000
500
0
100 101 102 103 104 105 106 107
Array Size [Bytes]
Overview
MPI for Python
PETSc for Python
Applications
PETSc for Python (petsc4py)
◮ Python bindings for PETSc
◮ Implemented with Cython
◮ Supports most important PETSc features
◮ Pythonic API that better match PETSc’s OO design
◮ class hierarchies, methods, properties
◮ automatic object lifetime management
◮ exception-based error handling
[petsc4py] Features – PETSc components
◮ Index Sets: permutations, indexing, renumbering
◮ Vectors: sequential and distributed
◮ Matrices: sequential and distributed, sparse and dense
◮ Linear Solvers: Krylov subspace methods
◮ Preconditioners: sparse direct solvers, multigrid
◮ Nonlinear Solvers: line search, trust region, matrix-free
◮ Timesteppers: time-dependent, linear and nonlinear PDE’s
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (KSP)
PETSc
Preconditioners (PC)
Application Function Jacobian
Postprocessing
Initialization Evaluation Evaluation
[petsc4py] Vectors (Vec) – CG Method
def cg(A, b, x, imax=50, eps=1e-6):
cg (A, x, b, imax , ǫ) : """
i ⇐0 A, b, x : matrix, rhs, solution
imax : maximum iterations
r ⇐ b − Ax eps : relative tolerance
d ⇐r """
# allocate work vectors
T
δ0 ⇐ r r r = [Link]()
d = [Link]()
δ ⇐ δ0
q = [Link]()
while i < imax and # initialization
2 i = 0
δ > δ0 ǫ do : [Link](x, r)
q ⇐ Ad [Link](-1, b)
[Link](d)
δ delta_0 = [Link](r)
α⇐
dT q delta = delta_0
# enter iteration loop
x ⇐ x + αd
while (i < imax and
r ⇐ r − αq delta > delta_0 * eps**2):
δold ⇐ δ [Link](d, q)
alpha = delta / [Link](q)
T [Link](+alpha, d)
δ ⇐r r
[Link](-alpha, q)
δ delta_old = delta
β ⇐
δold delta = [Link](r)
beta = delta / delta_old
d ⇐ r + βd
[Link](beta, r)
i ⇐i +1 i = i + 1
return i, delta**0.5
[petsc4py] Matrices (Mat) [1]
from petsc4py import PETSc
# grid size and spacing
m, n = 32, 32
hx = 1.0/(m-1)
hy = 1.0/(n-1)
# create sparse matrix
A = [Link]()
[Link](PETSc.COMM_WORLD)
[Link]([m*n, m*n])
[Link](’aij’) # sparse
# precompute values for setting
# diagonal and non-diagonal entries
diagv = 2.0/hx**2 + 2.0/hy**2
offdx = -1.0/hx**2
offdy = -1.0/hy**2
[petsc4py] Matrices (Mat) [2]
# loop over owned block of rows on this
# processor and insert entry values
Istart, Iend = [Link]()
for I in range(Istart, Iend) :
A[I,I] = diagv
i = I//n # map row number to
j = I - i*n # grid coordinates
if i> 0 : J = I-n; A[I,J] = offdx
if i< m-1: J = I+n; A[I,J] = offdx
if j> 0 : J = I-1; A[I,J] = offdy
if j< n-1: J = I+1; A[I,J] = offdy
# communicate off-processor values
# and setup internal data structures
# for performing parallel operations
[Link]()
[Link]()
[petsc4py] Linear Solvers (KSP+PC)
# create linear solver,
ksp = [Link]()
[Link](PETSc.COMM_WORLD)
# use conjugate gradients method
[Link](’cg’)
# and incomplete Cholesky
[Link]().setType(’icc’)
# obtain sol & rhs vectors
x, b = [Link]()
[Link](0)
[Link](1)
# and next solve
[Link](A)
[Link]()
[Link](b, x)
[petsc4py] Interoperability
Support for wrapping other PETSc-based C/C++/F90 codes
◮ using Cython (cimport statement)
◮ using SWIG (typemaps provided)
◮ using F2Py (fortran attribute)
[petsc4py] Interoperability – SWIG
%module MyApp
%include petsc4py/petsc4py.i
%{
#include "MyApp.h"
%}
class Nonlinear {
Nonlinear(MPI_Comm comm,
const char datafile[]);
Vec createVec();
Vec createMat();
void evalFunction(SNES snes, Vec X, Vec F);
bool evalJacobian(SNES snes, Vec X, Mat J, Mat P);
};
[petsc4py] Interoperability – SWIG
from petsc4py import PETSc
import MyApp
comm = PETSc.COMM_WORLD
app = [Link](comm, "[Link]")
X = [Link]()
F = [Link]()
J = [Link]()
snes = [Link]().create(comm)
[Link]([Link], F)
[Link]([Link], J)
[Link]()
[Link](None, X)
Overview
MPI for Python
PETSc for Python
Applications
Microfluidics (µ-TAS)
Microfluidics (µ-TAS)
mpi4py
◮ Development: [Link]
◮ Mailing List: mpi4py@[Link]
◮ Chat: dalcinl@[Link]
petsc4py
◮ Development: [Link]
◮ Mailing List: petsc-users@[Link]
◮ Chat: dalcinl@[Link]
Thanks!