0% found this document useful (0 votes)
275 views529 pages

Numpy User

Uploaded by

Thiago Iath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views529 pages

Numpy User

Uploaded by

Thiago Iath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 529

NumPy User Guide

Release 1.22.0

Written by the NumPy community

December 31, 2021


CONTENTS

1 What is NumPy? 3

2 NumPy quickstart 5

3 NumPy: the absolute basics for beginners 29

4 NumPy fundamentals 63

5 Miscellaneous 147

6 NumPy for MATLAB users 153

7 Building from source 165

8 Using NumPy C-API 171

9 NumPy How Tos 217

10 For downstream package authors 225

11 F2PY user guide and reference manual 229

12 Glossary 279

13 Under-the-hood Documentation for developers 287

14 Reporting bugs 299

15 Release notes 301

16 NumPy license 519

Python Module Index 521

Index 523

i
ii
NumPy User Guide, Release 1.22.0

This guide is an overview and explains the important features; details are found in Command Reference.

CONTENTS 1
NumPy User Guide, Release 1.22.0

2 CONTENTS
CHAPTER

ONE

WHAT IS NUMPY?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidi-
mensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for
fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier
transforms, basic linear algebra, basic statistical operations, random simulation and much more.
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data
types, with many operations being performed in compiled code for performance. There are several important differences
between NumPy arrays and the standard Python sequences:
• NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size
of an ndarray will create a new array and delete the original.
• The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in
memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays
of different sized elements.
• NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically,
such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
• A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these
typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they
often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scien-
tific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient
- one also needs to know how to use NumPy arrays.
The points about sequence size and speed are particularly important in scientific computing. As a simple example, consider
the case of multiplying each element in a 1-D sequence with the corresponding element in another sequence of the same
length. If the data are stored in two Python lists, a and b, we could iterate over each element:

c = []
for i in range(len(a)):
[Link](a[i]*b[i])

This produces the correct answer, but if a and b each contain millions of numbers, we will pay the price for the inef-
ficiencies of looping in Python. We could accomplish the same task much more quickly in C by writing (for clarity we
neglect variable declarations and initializations, memory allocation, etc.)

for (i = 0; i < rows; i++): {


c[i] = a[i]*b[i];
}

This saves all the overhead involved in interpreting the Python code and manipulating Python objects, but at the expense
of the benefits gained from coding in Python. Furthermore, the coding work required increases with the dimensionality
of our data. In the case of a 2-D array, for example, the C code (abridged as before) expands to

3
NumPy User Guide, Release 1.22.0

for (i = 0; i < rows; i++): {


for (j = 0; j < columns; j++): {
c[i][j] = a[i][j]*b[i][j];
}
}

NumPy gives us the best of both worlds: element-by-element operations are the “default mode” when an ndarray is
involved, but the element-by-element operation is speedily executed by pre-compiled C code. In NumPy

c = a * b

does what the earlier examples do, at near-C speeds, but with the code simplicity we expect from something based on
Python. Indeed, the NumPy idiom is even simpler! This last example illustrates two of NumPy’s features which are the
basis of much of its power: vectorization and broadcasting.

1.1 Why is NumPy Fast?

Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of
course, just “behind the scenes” in optimized, pre-compiled C code. Vectorized code has many advantages, among which
are:
• vectorized code is more concise and easier to read
• fewer lines of code generally means fewer bugs
• the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code
mathematical constructs)
• vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient
and difficult to read for loops.
Broadcasting is the term used to describe the implicit element-by-element behavior of operations; generally speaking,
in NumPy all operations, not just arithmetic operations, but logical, bit-wise, functional, etc., behave in this implicit
element-by-element fashion, i.e., they broadcast. Moreover, in the example above, a and b could be multidimensional
arrays of the same shape, or a scalar and an array, or even two arrays of with different shapes, provided that the smaller
array is “expandable” to the shape of the larger in such a way that the resulting broadcast is unambiguous. For detailed
“rules” of broadcasting see Broadcasting.

1.2 Who Else Uses NumPy?

NumPy fully supports an object-oriented approach, starting, once again, with ndarray. For example, ndarray is a class,
possessing numerous methods and attributes. Many of its methods are mirrored by functions in the outer-most NumPy
namespace, allowing the programmer to code in whichever paradigm they prefer. This flexibility has allowed the NumPy
array dialect and NumPy ndarray class to become the de-facto language of multi-dimensional data interchange used in
Python.

4 1. What is NumPy?
CHAPTER

TWO

NUMPY QUICKSTART

2.1 Prerequisites

You’ll need to know a bit of Python. For a refresher, see the Python tutorial.
To work the examples, you’ll need matplotlib installed in addition to NumPy.
Learner profile
This is a quick overview of arrays in NumPy. It demonstrates how n-dimensional (n >= 2) arrays are represented and
can be manipulated. In particular, if you don’t know how to apply common functions to n-dimensional arrays (without
using for-loops), or if you want to understand axis and shape properties for n-dimensional arrays, this article might be of
help.
Learning Objectives
After reading, you should be able to:
• Understand the difference between one-, two- and n-dimensional arrays in NumPy;
• Understand how to apply some linear algebra operations to n-dimensional arrays without using for-loops;
• Understand axis and shape properties for n-dimensional arrays.

2.2 The Basics

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the
same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.
For example, the array for the coordinates of a point in 3D space, [1, 2, 1], has one axis. That axis has 3 elements
in it, so we say it has a length of 3. In the example pictured below, the array has 2 axes. The first axis has a length of 2,
the second axis has a length of 3.

[[1., 0., 0.],


[0., 1., 2.]]

NumPy’s array class is called ndarray. It is also known by the alias array. Note that [Link] is not the
same as the Standard Python Library class [Link], which only handles one-dimensional arrays and offers less
functionality. The more important attributes of an ndarray object are:
[Link]
the number of axes (dimensions) of the array.

5
NumPy User Guide, Release 1.22.0

[Link]
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a
matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number
of axes, ndim.
[Link]
the total number of elements of the array. This is equal to the product of the elements of shape.
[Link]
an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python
types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some
examples.
[Link]
the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize
8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to [Link].
itemsize.
[Link]
the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we
will access the elements in an array using indexing facilities.

2.2.1 An example

>>> import numpy as np


>>> a = [Link](15).reshape(3, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> [Link]
(3, 5)
>>> [Link]
2
>>> [Link]
'int64'
>>> [Link]
8
>>> [Link]
15
>>> type(a)
<class '[Link]'>
>>> b = [Link]([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<class '[Link]'>

6 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

2.2.2 Array Creation

There are several ways to create arrays.


For example, you can create an array from a regular Python list or tuple using the array function. The type of the
resulting array is deduced from the type of the elements in the sequences.
>>> import numpy as np
>>> a = [Link]([2, 3, 4])
>>> a
array([2, 3, 4])
>>> [Link]
dtype('int64')
>>> b = [Link]([1.2, 3.5, 5.1])
>>> [Link]
dtype('float64')

A frequent error consists in calling array with multiple arguments, rather than providing a single sequence as an argu-
ment.
>>> a = [Link](1, 2, 3, 4) # WRONG
Traceback (most recent call last):
...
TypeError: array() takes from 1 to 2 positional arguments but 4 were given
>>> a = [Link]([1, 2, 3, 4]) # RIGHT

array transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-
dimensional arrays, and so on.
>>> b = [Link]([(1.5, 2, 3), (4, 5, 6)])
>>> b
array([[1.5, 2. , 3. ],
[4. , 5. , 6. ]])

The type of the array can also be explicitly specified at creation time:
>>> c = [Link]([[1, 2], [3, 4]], dtype=complex)
>>> c
array([[1.+0.j, 2.+0.j],
[3.+0.j, 4.+0.j]])

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to
create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.
The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function
empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype
of the created array is float64, but it can be specified via the key word argument dtype.
>>> [Link]((3, 4))
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>> [Link]((2, 3, 4), dtype=np.int16)
array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],

[[1, 1, 1, 1],
(continues on next page)

2.2. The Basics 7


NumPy User Guide, Release 1.22.0

(continued from previous page)


[1, 1, 1, 1],
[1, 1, 1, 1]]], dtype=int16)
>>> [Link]((2, 3))
array([[3.73603959e-262, 6.02658058e-154, 6.55490914e-260], # may vary
[5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])

To create sequences of numbers, NumPy provides the arange function which is analogous to the Python built-in range,
but returns an array.

>>> [Link](10, 30, 5)


array([10, 15, 20, 25])
>>> [Link](0, 2, 0.3) # it accepts float arguments
array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

When arange is used with floating point arguments, it is generally not possible to predict the number of elements
obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace that
receives as an argument the number of elements that we want, instead of the step:

>>> from numpy import pi


>>> [Link](0, 2, 9) # 9 numbers from 0 to 2
array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
>>> x = [Link](0, 2 * pi, 100) # useful to evaluate function at lots of␣
,→points

>>> f = [Link](x)

See also:
array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace,
[Link], [Link], fromfunction, fromfile

2.2.3 Printing Arrays

When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:
• the last axis is printed from left to right,
• the second-to-last is printed from top to bottom,
• the rest are also printed from top to bottom, with each slice separated from the next by an empty line.
One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

>>> a = [Link](6) # 1d array


>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = [Link](12).reshape(4, 3) # 2d array
>>> print(b)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>>
>>> c = [Link](24).reshape(2, 3, 4) # 3d array
>>> print(c)
[[[ 0 1 2 3]
(continues on next page)

8 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

(continued from previous page)


[ 4 5 6 7]
[ 8 9 10 11]]

[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]

See below to get more details on reshape.


If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners:
>>> print([Link](10000))
[ 0 1 2 ... 9997 9998 9999]
>>>
>>> print([Link](10000).reshape(100, 100))
[[ 0 1 2 ... 97 98 99]
[ 100 101 102 ... 197 198 199]
[ 200 201 202 ... 297 298 299]
...
[9700 9701 9702 ... 9797 9798 9799]
[9800 9801 9802 ... 9897 9898 9899]
[9900 9901 9902 ... 9997 9998 9999]]

To disable this behaviour and force NumPy to print the entire array, you can change the printing options using
set_printoptions.
>>> np.set_printoptions(threshold=[Link]) # sys module should be imported

2.2.4 Basic Operations

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.
>>> a = [Link]([20, 30, 40, 50])
>>> b = [Link](4)
>>> b
array([0, 1, 2, 3])
>>> c = a - b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
>>> 10 * [Link](a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
>>> a < 35
array([ True, True, False, False])

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product
can be performed using the @ operator (in python >=3.5) or the dot function or method:
>>> A = [Link]([[1, 1],
... [0, 1]])
>>> B = [Link]([[2, 0],
... [3, 4]])
>>> A * B # elementwise product
array([[2, 0],
(continues on next page)

2.2. The Basics 9


NumPy User Guide, Release 1.22.0

(continued from previous page)


[0, 4]])
>>> A @ B # matrix product
array([[5, 4],
[3, 4]])
>>> [Link](B) # another matrix product
array([[5, 4],
[3, 4]])

Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.

>>> rg = [Link].default_rng(1) # create instance of default random number␣


,→generator

>>> a = [Link]((2, 3), dtype=int)


>>> b = [Link]((2, 3))
>>> a *= 3
>>> a
array([[3, 3, 3],
[3, 3, 3]])
>>> b += a
>>> b
array([[3.51182162, 3.9504637 , 3.14415961],
[3.94864945, 3.31183145, 3.42332645]])
>>> a += b # b is not automatically converted to integer type
Traceback (most recent call last):
...
[Link]._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'add' output from␣
,→dtype('float64') to dtype('int64') with casting rule 'same_kind'

When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise
one (a behavior known as upcasting).

>>> a = [Link](3, dtype=np.int32)


>>> b = [Link](0, pi, 3)
>>> [Link]
'float64'
>>> c = a + b
>>> c
array([1. , 2.57079633, 4.14159265])
>>> [Link]
'float64'
>>> d = [Link](c * 1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
-0.54030231-0.84147098j])
>>> [Link]
'complex128'

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the
ndarray class.

>>> a = [Link]((2, 3))


>>> a
array([[0.82770259, 0.40919914, 0.54959369],
[0.02755911, 0.75351311, 0.53814331]])
>>> [Link]()
3.1057109529998157
(continues on next page)

10 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

(continued from previous page)


>>> [Link]()
0.027559113243068367
>>> [Link]()
0.8277025938204418

By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by
specifying the axis parameter you can apply an operation along the specified axis of an array:

>>> b = [Link](12).reshape(3, 4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> [Link](axis=0) # sum of each column
array([12, 15, 18, 21])
>>>
>>> [Link](axis=1) # min of each row
array([0, 4, 8])
>>>
>>> [Link](axis=1) # cumulative sum along each row
array([[ 0, 1, 3, 6],
[ 4, 9, 15, 22],
[ 8, 17, 27, 38]])

2.2.5 Universal Functions

NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal func-
tions” (ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

>>> B = [Link](3)
>>> B
array([0, 1, 2])
>>> [Link](B)
array([1. , 2.71828183, 7.3890561 ])
>>> [Link](B)
array([0. , 1. , 1.41421356])
>>> C = [Link]([2., -1., 4.])
>>> [Link](B, C)
array([2., 0., 6.])

See also:
all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj,
corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, invert, lexsort, max, maximum,
mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose,
var, vdot, vectorize, where

2.2. The Basics 11


NumPy User Guide, Release 1.22.0

2.2.6 Indexing, Slicing and Iterating

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

>>> a = [Link](10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> # equivalent to a[Link] = 1000;
>>> # from start to position 6, exclusive, set every 2nd element to 1000
>>> a[:6:2] = 1000
>>> a
array([1000, 1, 1000, 27, 1000, 125, 216, 343, 512, 729])
>>> a[::-1] # reversed a
array([ 729, 512, 343, 216, 125, 1000, 27, 1000, 1, 1000])
>>> for i in a:
... print(i**(1 / 3.))
...
9.999999999999998
1.0
9.999999999999998
3.0
9.999999999999998
4.999999999999999
5.999999999999999
6.999999999999999
7.999999999999999
8.999999999999998

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

>>> def f(x, y):


... return 10 * x + y
...
>>> b = [Link](f, (5, 4), dtype=int)
>>> b
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])
>>> b[2, 3]
23
>>> b[0:5, 1] # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[:, 1] # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, :] # each column in the second and third row of b
array([[10, 11, 12, 13],
[20, 21, 22, 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:

>>> b[-1] # the last row. Equivalent to b[-1, :]


array([40, 41, 42, 43])

12 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the
remaining axes. NumPy also allows you to write this using dots as b[i, ...].
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array
with 5 axes, then
• x[1, 2, ...] is equivalent to x[1, 2, :, :, :],
• x[..., 3] to x[:, :, :, :, 3] and
• x[4, ..., 5, :] to x[4, :, :, 5, :].

>>> c = [Link]([[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)


... [ 10, 12, 13]],
... [[100, 101, 102],
... [110, 112, 113]]])
>>> [Link]
(2, 2, 3)
>>> c[1, ...] # same as c[1, :, :] or c[1]
array([[100, 101, 102],
[110, 112, 113]])
>>> c[..., 2] # same as c[:, :, 2]
array([[ 2, 13],
[102, 113]])

Iterating over multidimensional arrays is done with respect to the first axis:

>>> for row in b:


... print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an
iterator over all the elements of the array:

>>> for element in [Link]:


... print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
(continues on next page)

2.2. The Basics 13


NumPy User Guide, Release 1.22.0

(continued from previous page)


41
42
43

See also:
Indexing on ndarrays, [Link] (reference), newaxis, ndenumerate, indices

2.3 Shape Manipulation

2.3.1 Changing the shape of an array

An array has a shape given by the number of elements along each axis:

>>> a = [Link](10 * [Link]((3, 4)))


>>> a
array([[3., 7., 3., 4.],
[1., 4., 2., 2.],
[7., 2., 4., 9.]])
>>> [Link]
(3, 4)

The shape of an array can be changed with various commands. Note that the following three commands all return a
modified array, but do not change the original array:

>>> [Link]() # returns the array, flattened


array([3., 7., 3., 4., 1., 4., 2., 2., 7., 2., 4., 9.])
>>> [Link](6, 2) # returns the array with a modified shape
array([[3., 7.],
[3., 4.],
[1., 4.],
[2., 2.],
[7., 2.],
[4., 9.]])
>>> a.T # returns the array, transposed
array([[3., 1., 7.],
[7., 4., 2.],
[3., 2., 4.],
[4., 2., 9.]])
>>> [Link]
(4, 3)
>>> [Link]
(3, 4)

The order of the elements in the array resulting from ravel is normally “C-style”, that is, the rightmost index “changes
the fastest”, so the element after a[0, 0] is a[0, 1]. If the array is reshaped to some other shape, again the array
is treated as “C-style”. NumPy normally creates arrays stored in this order, so ravel will usually not need to copy its
argument, but if the array was made by taking slices of another array or created with unusual options, it may need to be
copied. The functions ravel and reshape can also be instructed, using an optional argument, to use FORTRAN-style
arrays, in which the leftmost index changes the fastest.
The reshape function returns its argument with a modified shape, whereas the [Link] method modifies
the array itself:

14 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

>>> a
array([[3., 7., 3., 4.],
[1., 4., 2., 2.],
[7., 2., 4., 9.]])
>>> [Link]((2, 6))
>>> a
array([[3., 7., 3., 4., 1., 4.],
[2., 2., 7., 2., 4., 9.]])

If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:

>>> [Link](3, -1)


array([[3., 7., 3., 4.],
[1., 4., 2., 2.],
[7., 2., 4., 9.]])

See also:
[Link], reshape, resize, ravel

2.3.2 Stacking together different arrays

Several arrays can be stacked together along different axes:

>>> a = [Link](10 * [Link]((2, 2)))


>>> a
array([[9., 7.],
[5., 2.]])
>>> b = [Link](10 * [Link]((2, 2)))
>>> b
array([[1., 9.],
[5., 1.]])
>>> [Link]((a, b))
array([[9., 7.],
[5., 2.],
[1., 9.],
[5., 1.]])
>>> [Link]((a, b))
array([[9., 7., 1., 9.],
[5., 2., 5., 1.]])

The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 2D
arrays:

>>> from numpy import newaxis


>>> np.column_stack((a, b)) # with 2D arrays
array([[9., 7., 1., 9.],
[5., 2., 5., 1.]])
>>> a = [Link]([4., 2.])
>>> b = [Link]([3., 8.])
>>> np.column_stack((a, b)) # returns a 2D array
array([[4., 3.],
[2., 8.]])
>>> [Link]((a, b)) # the result is different
array([4., 2., 3., 8.])
>>> a[:, newaxis] # view `a` as a 2D column vector
(continues on next page)

2.3. Shape Manipulation 15


NumPy User Guide, Release 1.22.0

(continued from previous page)


array([[4.],
[2.]])
>>> np.column_stack((a[:, newaxis], b[:, newaxis]))
array([[4., 3.],
[2., 8.]])
>>> [Link]((a[:, newaxis], b[:, newaxis])) # the result is the same
array([[4., 3.],
[2., 8.]])

On the other hand, the function row_stack is equivalent to vstack for any input arrays. In fact, row_stack is an
alias for vstack:

>>> np.column_stack is [Link]


False
>>> np.row_stack is [Link]
True

In general, for arrays with more than two dimensions, hstack stacks along their second axes, vstack stacks along
their first axes, and concatenate allows for an optional arguments giving the number of the axis along which the
concatenation should happen.
Note
In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of
range literals :.

>>> np.r_[1:4, 0, 4]
array([1, 2, 3, 0, 4])

When used with arrays as arguments, r_ and c_ are similar to vstack and hstack in their default behavior, but allow
for an optional argument giving the number of the axis along which to concatenate.
See also:
hstack, vstack, column_stack, concatenate, c_, r_

2.3.3 Splitting one array into several smaller ones

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays
to return, or by specifying the columns after which the division should occur:

>>> a = [Link](10 * [Link]((2, 12)))


>>> a
array([[6., 7., 6., 9., 0., 5., 4., 0., 6., 8., 5., 2.],
[8., 5., 5., 7., 1., 8., 6., 7., 1., 8., 1., 0.]])
>>> # Split `a` into 3
>>> [Link](a, 3)
[array([[6., 7., 6., 9.],
[8., 5., 5., 7.]]), array([[0., 5., 4., 0.],
[1., 8., 6., 7.]]), array([[6., 8., 5., 2.],
[1., 8., 1., 0.]])]
>>> # Split `a` after the third and the fourth column
>>> [Link](a, (3, 4))
[array([[6., 7., 6.],
[8., 5., 5.]]), array([[9.],
(continues on next page)

16 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

(continued from previous page)


[7.]]), array([[0., 5., 4., 0., 6., 8., 5., 2.],
[1., 8., 6., 7., 1., 8., 1., 0.]])]

vsplit splits along the vertical axis, and array_split allows one to specify along which axis to split.

2.4 Copies and Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is
often a source of confusion for beginners. There are three cases:

2.4.1 No Copy at All

Simple assignments make no copy of objects or their data.

>>> a = [Link]([[ 0, 1, 2, 3],


... [ 4, 5, 6, 7],
... [ 8, 9, 10, 11]])
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True

Python passes mutable objects as references, so function calls make no copy.

>>> def f(x):


... print(id(x))
...
>>> id(a) # id is a unique identifier of an object
148293216 # may vary
>>> f(a)
148293216 # may vary

2.4.2 View or Shallow Copy

Different array objects can share the same data. The view method creates a new array object that looks at the same data.

>>> c = [Link]()
>>> c is a
False
>>> [Link] is a # c is a view of the data owned by a
True
>>> [Link]
False
>>>
>>> c = [Link]((2, 6)) # a's shape doesn't change
>>> [Link]
(3, 4)
>>> c[0, 4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])

2.4. Copies and Views 17


NumPy User Guide, Release 1.22.0

Slicing an array returns a view of it:

>>> s = a[:, 1:3]


>>> s[:] = 10 # s[:] is a view of s. Note the difference between s = 10 and s[:] = 10
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

2.4.3 Deep Copy

The copy method makes a complete copy of the array and its data.

>>> d = [Link]() # a new array object with new data is created


>>> d is a
False
>>> [Link] is a # d doesn't share anything with a
False
>>> d[0, 0] = 9999
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

Sometimes copy should be called after slicing if the original array is not required anymore. For example, suppose a is
a huge intermediate result and the final result b only contains a small fraction of a, a deep copy should be made when
constructing b with slicing:

>>> a = [Link](int(1e8))
>>> b = a[:100].copy()
>>> del a # the memory of ``a`` can be released.

If b = a[:100] is used instead, a is referenced by b and will persist in memory even if del a is executed.

2.4.4 Functions and Methods Overview

Here is a list of some useful NumPy functions and methods names ordered in categories. See routines for the full list.
Array Creation
arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity,
linspace, logspace, mgrid, ogrid, ones, ones_like, r_, zeros, zeros_like
Conversions
[Link], atleast_1d, atleast_2d, atleast_3d, mat
Manipulations
array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack,
[Link], newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take,
transpose, vsplit, vstack
Questions
all, any, nonzero, where

18 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

Ordering
argmax, argmin, argsort, max, min, ptp, searchsorted, sort
Operations
choose, compress, cumprod, cumsum, inner, [Link], imag, prod, put, putmask, real,
sum
Basic Statistics
cov, mean, std, var
Basic Linear Algebra
cross, dot, outer, [Link], vdot

2.5 Less Basic

2.5.1 Broadcasting rules

Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape.
The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly
prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the
size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same
along that dimension for the “broadcast” array.
After application of the broadcasting rules, the sizes of all arrays must match. More details can be found in Broadcasting.

2.6 Advanced indexing and index tricks

NumPy offers more indexing facilities than regular Python sequences. In addition to indexing by integers and slices, as
we saw before, arrays can be indexed by arrays of integers and arrays of booleans.

2.6.1 Indexing with Arrays of Indices

>>> a = [Link](12)**2 # the first 12 square numbers


>>> i = [Link]([1, 1, 3, 8, 5]) # an array of indices
>>> a[i] # the elements of `a` at the positions `i`
array([ 1, 1, 9, 64, 25])
>>>
>>> j = [Link]([[3, 4], [9, 7]]) # a bidimensional array of indices
>>> a[j] # the same shape as `j`
array([[ 9, 16],
[81, 49]])

When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. The following
example shows this behavior by converting an image of labels into a color image using a palette.

2.5. Less Basic 19


NumPy User Guide, Release 1.22.0

>>> palette = [Link]([[0, 0, 0], # black


... [255, 0, 0], # red
... [0, 255, 0], # green
... [0, 0, 255], # blue
... [255, 255, 255]]) # white
>>> image = [Link]([[0, 1, 2, 0], # each value corresponds to a color in the␣
,→palette

... [0, 3, 4, 0]])


>>> palette[image] # the (2, 4, 3) color image
array([[[ 0, 0, 0],
[255, 0, 0],
[ 0, 255, 0],
[ 0, 0, 0]],

[[ 0, 0, 0],
[ 0, 0, 255],
[255, 255, 255],
[ 0, 0, 0]]])

We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same
shape.

>>> a = [Link](12).reshape(3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i = [Link]([[0, 1], # indices for the first dim of `a`
... [1, 2]])
>>> j = [Link]([[2, 1], # indices for the second dim
... [3, 3]])
>>>
>>> a[i, j] # i and j must have equal shape
array([[ 2, 5],
[ 7, 11]])
>>>
>>> a[i, 2]
array([[ 2, 6],
[ 6, 10]])
>>>
>>> a[:, j]
array([[[ 2, 1],
[ 3, 3]],

[[ 6, 5],
[ 7, 7]],

[[10, 9],
[11, 11]]])

In Python, arr[i, j] is exactly the same as arr[(i, j)]—so we can put i and j in a tuple and then do the
indexing with that.

>>> l = (i, j)
>>> # equivalent to a[i, j]
>>> a[l]
array([[ 2, 5],
(continues on next page)

20 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

(continued from previous page)


[ 7, 11]])

However, we can not do this by putting i and j into an array, because this array will be interpreted as indexing the first
dimension of a.

>>> s = [Link]([i, j])


>>> # not what we want
>>> a[s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 0 with size 3
>>> # same as `a[i, j]`
>>> a[tuple(s)]
array([[ 2, 5],
[ 7, 11]])

Another common use of indexing with arrays is the search of the maximum value of time-dependent series:

>>> time = [Link](20, 145, 5) # time scale


>>> data = [Link]([Link](20)).reshape(5, 4) # 4 time-dependent series
>>> time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
>>> data
array([[ 0. , 0.84147098, 0.90929743, 0.14112001],
[-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ],
[ 0.98935825, 0.41211849, -0.54402111, -0.99999021],
[-0.53657292, 0.42016704, 0.99060736, 0.65028784],
[-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
>>> # index of the maxima for each series
>>> ind = [Link](axis=0)
>>> ind
array([2, 0, 3, 1])
>>> # times corresponding to the maxima
>>> time_max = time[ind]
>>>
>>> data_max = data[ind, range([Link][1])] # => data[ind[0], 0], data[ind[1], 1].
,→..

>>> time_max
array([ 82.5 , 20. , 113.75, 51.25])
>>> data_max
array([0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
>>> [Link](data_max == [Link](axis=0))
True

You can also use indexing with arrays as a target to assign to:

>>> a = [Link](5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1, 3, 4]] = 0
>>> a
array([0, 0, 2, 0, 0])

However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:

2.6. Advanced indexing and index tricks 21


NumPy User Guide, Release 1.22.0

>>> a = [Link](5)
>>> a[[0, 0, 2]] = [1, 2, 3]
>>> a
array([2, 1, 3, 3, 4])

This is reasonable enough, but watch out if you want to use Python’s += construct, as it may not do what you expect:
>>> a = [Link](5)
>>> a[[0, 0, 2]] += 1
>>> a
array([1, 1, 3, 3, 4])

Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. This is because Python
requires a += 1 to be equivalent to a = a + 1.

2.6.2 Indexing with Boolean Arrays

When we index arrays with arrays of (integer) indices we are providing the list of indices to pick. With boolean indices
the approach is different; we explicitly choose which items in the array we want and which ones we don’t.
The most natural way one can think of for boolean indexing is to use boolean arrays that have the same shape as the
original array:
>>> a = [Link](12).reshape(3, 4)
>>> b = a > 4
>>> b # `b` is a boolean with `a`'s shape
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]])
>>> a[b] # 1d array with the selected elements
array([ 5, 6, 7, 8, 9, 10, 11])

This property can be very useful in assignments:


>>> a[b] = 0 # All elements of `a` higher than 4 become 0
>>> a
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])

You can look at the following example to see how to use boolean indexing to generate an image of the Mandelbrot set:
>>> import numpy as np
>>> import [Link] as plt
>>> def mandelbrot(h, w, maxit=20, r=2):
... """Returns an image of the Mandelbrot fractal of size (h,w)."""
... x = [Link](-2.5, 1.5, 4*h+1)
... y = [Link](-1.5, 1.5, 3*w+1)
... A, B = [Link](x, y)
... C = A + B*1j
... z = np.zeros_like(C)
... divtime = maxit + [Link]([Link], dtype=int)
...
... for i in range(maxit):
... z = z**2 + C
... diverge = abs(z) > r # who is diverging
(continues on next page)

22 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

(continued from previous page)


... div_now = diverge & (divtime == maxit) # who is diverging now
... divtime[div_now] = i # note when
... z[diverge] = r # avoid diverging too much
...
... return divtime
>>> [Link](mandelbrot(400, 400))

0
200
400
600
800
1000
1200
0 250 500 750 1000 1250 1500

The second way of indexing with booleans is more similar to integer indexing; for each dimension of the array we give a
1D boolean array selecting the slices we want:

>>> a = [Link](12).reshape(3, 4)
>>> b1 = [Link]([False, True, True]) # first dim selection
>>> b2 = [Link]([True, False, True, False]) # second dim selection
>>>
>>> a[b1, :] # selecting rows
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[b1] # same thing
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[:, b2] # selecting columns
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>>
>>> a[b1, b2] # a weird thing to do
array([ 4, 10])

Note that the length of the 1D boolean array must coincide with the length of the dimension (or axis) you want to slice.
In the previous example, b1 has length 3 (the number of rows in a), and b2 (of length 4) is suitable to index the 2nd axis
(columns) of a.

2.6. Advanced indexing and index tricks 23


NumPy User Guide, Release 1.22.0

2.6.3 The ix_() function

The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if
you want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c:

>>> a = [Link]([2, 3, 4, 5])


>>> b = [Link]([8, 5, 4])
>>> c = [Link]([5, 4, 6, 8, 3])
>>> ax, bx, cx = np.ix_(a, b, c)
>>> ax
array([[[2]],

[[3]],

[[4]],

[[5]]])
>>> bx
array([[[8],
[5],
[4]]])
>>> cx
array([[[5, 4, 6, 8, 3]]])
>>> [Link], [Link], [Link]
((4, 1, 1), (1, 3, 1), (1, 1, 5))
>>> result = ax + bx * cx
>>> result
array([[[42, 34, 50, 66, 26],
[27, 22, 32, 42, 17],
[22, 18, 26, 34, 14]],

[[43, 35, 51, 67, 27],


[28, 23, 33, 43, 18],
[23, 19, 27, 35, 15]],

[[44, 36, 52, 68, 28],


[29, 24, 34, 44, 19],
[24, 20, 28, 36, 16]],

[[45, 37, 53, 69, 29],


[30, 25, 35, 45, 20],
[25, 21, 29, 37, 17]]])
>>> result[3, 2, 4]
17
>>> a[3] + b[2] * c[4]
17

You could also implement the reduce as follows:

>>> def ufunc_reduce(ufct, *vectors):


... vs = np.ix_(*vectors)
... r = [Link]
... for v in vs:
... r = ufct(r, v)
... return r

and then use it as:

24 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

>>> ufunc_reduce([Link], a, b, c)
array([[[15, 14, 16, 18, 13],
[12, 11, 13, 15, 10],
[11, 10, 12, 14, 9]],

[[16, 15, 17, 19, 14],


[13, 12, 14, 16, 11],
[12, 11, 13, 15, 10]],

[[17, 16, 18, 20, 15],


[14, 13, 15, 17, 12],
[13, 12, 14, 16, 11]],

[[18, 17, 19, 21, 16],


[15, 14, 16, 18, 13],
[14, 13, 15, 17, 12]]])

The advantage of this version of reduce compared to the normal [Link] is that it makes use of the broadcasting
rules in order to avoid creating an argument array the size of the output times the number of vectors.

2.6.4 Indexing with strings

See Structured arrays.

2.7 Tricks and Tips

Here we give a list of short and useful tips.

2.7.1 “Automatic” Reshaping

To change the dimensions of an array, you can omit one of the sizes which will then be deduced automatically:

>>> a = [Link](30)
>>> b = [Link]((2, -1, 3)) # -1 means "whatever is needed"
>>> [Link]
(2, 5, 3)
>>> b
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]],

[[15, 16, 17],


[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]]])

2.7. Tricks and Tips 25


NumPy User Guide, Release 1.22.0

2.7.2 Vector Stacking

How do we construct a 2D array from a list of equally-sized row vectors? In MATLAB this is quite easy: if x and y are
two vectors of the same length you only need do m=[x;y]. In NumPy this works via the functions column_stack,
dstack, hstack and vstack, depending on the dimension in which the stacking is to be done. For example:

>>> x = [Link](0, 10, 2)


>>> y = [Link](5)
>>> m = [Link]([x, y])
>>> m
array([[0, 2, 4, 6, 8],
[0, 1, 2, 3, 4]])
>>> xy = [Link]([x, y])
>>> xy
array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4])

The logic behind those functions in more than two dimensions can be strange.
See also:
NumPy for MATLAB users

2.7.3 Histograms

The NumPy histogram function applied to an array returns a pair of vectors: the histogram of the array and a vector
of the bin edges. Beware: matplotlib also has a function to build histograms (called hist, as in Matlab) that differs
from the one in NumPy. The main difference is that [Link] plots the histogram automatically, while numpy.
histogram only generates the data.

>>> import numpy as np


>>> rg = [Link].default_rng(1)
>>> import [Link] as plt
>>> # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
>>> mu, sigma = 2, 0.5
>>> v = [Link](mu, sigma, 10000)
>>> # Plot a normalized histogram with 50 bins
>>> [Link](v, bins=50, density=True) # matplotlib version (plot)
>>> # Compute the histogram with numpy and then plot it
>>> (n, bins) = [Link](v, bins=50, density=True) # NumPy version (no plot)
>>> [Link](.5 * (bins[1:] + bins[:-1]), n)

With Matplotlib >=3.4 you can also use [Link](n, bins).

2.8 Further reading

• The Python tutorial


• Command Reference
• SciPy Tutorial
• SciPy Lecture Notes
• A matlab, R, IDL, NumPy/SciPy dictionary
• tutorial-svd

26 2. NumPy quickstart
NumPy User Guide, Release 1.22.0

0.8

0.6

0.4

0.2

0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

2.8. Further reading 27


NumPy User Guide, Release 1.22.0

28 2. NumPy quickstart
CHAPTER

THREE

NUMPY: THE ABSOLUTE BASICS FOR BEGINNERS

Welcome to the absolute beginner’s guide to NumPy! If you have comments or suggestions, please don’t hesitate to reach
out!

3.1 Welcome to NumPy!

NumPy (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering.
It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and
PyData ecosystems. NumPy users include everyone from beginning coders to experienced researchers doing state-of-the-
art scientific and industrial research and development. The NumPy API is used extensively in Pandas, SciPy, Matplotlib,
scikit-learn, scikit-image and most other data science and scientific Python packages.
The NumPy library contains multidimensional array and matrix data structures (you’ll find more information about this
in later sections). It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on
it. NumPy can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures
to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level
mathematical functions that operate on these arrays and matrices.
Learn more about NumPy here!

3.2 Installing NumPy

To install NumPy, we strongly recommend using a scientific Python distribution. If you’re looking for the full instructions
for installing NumPy on your operating system, see Installing NumPy.
If you already have Python, you can install NumPy with:

conda install numpy

or

pip install numpy

If you don’t have Python yet, you might want to consider using Anaconda. It’s the easiest way to get started. The good
thing about getting this distribution is the fact that you don’t need to worry too much about separately installing NumPy
or any of the major packages that you’ll be using for your data analyses, like pandas, Scikit-Learn, etc.

29
NumPy User Guide, Release 1.22.0

3.3 How to import NumPy

To access NumPy and its functions import it in your Python code like this:

import numpy as np

We shorten the imported name to np for better readability of code using NumPy. This is a widely adopted convention
that you should follow so that anyone working with your code can easily understand it.

3.4 Reading the example code

If you aren’t already comfortable with reading tutorials that contain a lot of code, you might not know how to interpret a
code block that looks like this:

>>> a = [Link](6)
>>> a2 = a[[Link], :]
>>> [Link]
(1, 6)

If you aren’t familiar with this style, it’s very easy to understand. If you see >>>, you’re looking at input, or the code
that you would enter. Everything that doesn’t have >>> in front of it is output, or the results of running your code. This
is the style you see when you run python on the command line, but if you’re using IPython, you might see a different
style. Note that it is not part of the code and will cause an error if typed or pasted into the Python shell. It can be safely
typed or pasted into the IPython shell; the >>> is ignored.

3.5 What’s the difference between a Python list and a NumPy array?

NumPy gives you an enormous range of fast and efficient ways of creating arrays and manipulating numerical data inside
them. While a Python list can contain different data types within a single list, all of the elements in a NumPy array should
be homogeneous. The mathematical operations that are meant to be performed on arrays would be extremely inefficient
if the arrays weren’t homogeneous.
Why use NumPy?
NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use.
NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the
code to be optimized even further.

3.6 What is an array?

An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about
the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in
various ways. The elements are all of the same type, referred to as the array dtype.
An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The rank of
the array is the number of dimensions. The shape of the array is a tuple of integers giving the size of the array along
each dimension.
One way we can initialize NumPy arrays is from Python lists, using nested lists for two- or higher-dimensional data.
For example:

30 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

>>> a = [Link]([1, 2, 3, 4, 5, 6])

or:

>>> a = [Link]([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

We can access the elements in the array using square brackets. When you’re accessing elements, remember that indexing
in NumPy starts at 0. That means that if you want to access the first element in your array, you’ll be accessing element
“0”.

>>> print(a[0])
[1 2 3 4]

3.7 More information about arrays

This section covers 1D array, 2D array, ndarray, vector, matrix

You might occasionally hear an array referred to as a “ndarray,” which is shorthand for “N-dimensional array.” An N-
dimensional array is simply an array with any number of dimensions. You might also hear 1-D, or one-dimensional array,
2-D, or two-dimensional array, and so on. The NumPy ndarray class is used to represent both matrices and vectors. A
vector is an array with a single dimension (there’s no difference between row and column vectors), while a matrix refers
to an array with two dimensions. For 3-D or higher dimensional arrays, the term tensor is also commonly used.
What are the attributes of an array?
An array is usually a fixed-size container of items of the same type and size. The number of dimensions and items in
an array is defined by its shape. The shape of an array is a tuple of non-negative integers that specify the sizes of each
dimension.
In NumPy, dimensions are called axes. This means that if you have a 2D array that looks like this:

[[0., 0., 0.],


[1., 1., 1.]]

Your array has 2 axes. The first axis has a length of 2 and the second axis has a length of 3.
Just like in other Python container objects, the contents of an array can be accessed and modified by indexing or slicing
the array. Unlike the typical container objects, different arrays can share the same data, so changes made on one array
might be visible in another.
Array attributes reflect information intrinsic to the array itself. If you need to get, or even set, properties of an array
without creating a new array, you can often access an array through its attributes.
Read more about array attributes here and learn about array objects here.

3.7. More information about arrays 31


NumPy User Guide, Release 1.22.0

3.8 How to create a basic array

This section covers [Link](), [Link](), [Link](), [Link](), [Link](), np.


linspace(), dtype

To create a NumPy array, you can use the function [Link]().


All you need to do to create a simple array is pass a list to it. If you choose to, you can also specify the type of data in
your list. You can find more information about data types here.

>>> import numpy as np


>>> a = [Link]([1, 2, 3])

You can visualize your array this way:

Be aware that these visualizations are meant to simplify ideas and give you a basic understanding of NumPy concepts and
mechanics. Arrays and array operations are much more complicated than are captured here!
Besides creating an array from a sequence of elements, you can easily create an array filled with 0’s:

>>> [Link](2)
array([0., 0.])

Or an array filled with 1’s:

>>> [Link](2)
array([1., 1.])

Or even an empty array! The function empty creates an array whose initial content is random and depends on the state
of the memory. The reason to use empty over zeros (or something similar) is speed - just make sure to fill every
element afterwards!

>>> # Create an empty array with 2 elements


>>> [Link](2)
array([ 3.14, 42. ]) # may vary

You can create an array with a range of elements:

>>> [Link](4)
array([0, 1, 2, 3])

And even an array that contains a range of evenly spaced intervals. To do this, you will specify the first number, last
number, and the step size.

>>> [Link](2, 9, 2)
array([2, 4, 6, 8])

32 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

You can also use [Link]() to create an array with values that are spaced linearly in a specified interval:

>>> [Link](0, 10, num=5)


array([ 0. , 2.5, 5. , 7.5, 10. ])

Specifying your data type


While the default data type is floating point (np.float64), you can explicitly specify which data type you want using
the dtype keyword.

>>> x = [Link](2, dtype=np.int64)


>>> x
array([1, 1])

Learn more about creating arrays here

3.9 Adding, removing, and sorting elements

This section covers [Link](), [Link]()

Sorting an element is simple with [Link](). You can specify the axis, kind, and order when you call the function.
If you start with this array:

>>> arr = [Link]([2, 1, 5, 3, 7, 4, 6, 8])

You can quickly sort the numbers in ascending order with:

>>> [Link](arr)
array([1, 2, 3, 4, 5, 6, 7, 8])

In addition to sort, which returns a sorted copy of an array, you can use:
• argsort, which is an indirect sort along a specified axis,
• lexsort, which is an indirect stable sort on multiple keys,
• searchsorted, which will find elements in a sorted array, and
• partition, which is a partial sort.
To read more about sorting an array, see: sort.
If you start with these arrays:

>>> a = [Link]([1, 2, 3, 4])


>>> b = [Link]([5, 6, 7, 8])

You can concatenate them with [Link]().

>>> [Link]((a, b))


array([1, 2, 3, 4, 5, 6, 7, 8])

Or, if you start with these arrays:

>>> x = [Link]([[1, 2], [3, 4]])


>>> y = [Link]([[5, 6]])

3.9. Adding, removing, and sorting elements 33


NumPy User Guide, Release 1.22.0

You can concatenate them with:

>>> [Link]((x, y), axis=0)


array([[1, 2],
[3, 4],
[5, 6]])

In order to remove elements from an array, it’s simple to use indexing to select the elements that you want to keep.
To read more about concatenate, see: concatenate.

3.10 How do you know the shape and size of an array?

This section covers [Link], [Link], [Link]

[Link] will tell you the number of axes, or dimensions, of the array.
[Link] will tell you the total number of elements of the array. This is the product of the elements of the array’s
shape.
[Link] will display a tuple of integers that indicate the number of elements stored along each dimension of
the array. If, for example, you have a 2-D array with 2 rows and 3 columns, the shape of your array is (2, 3).
For example, if you create this array:

>>> array_example = [Link]([[[0, 1, 2, 3],


... [4, 5, 6, 7]],
...
... [[0, 1, 2, 3],
... [4, 5, 6, 7]],
...
... [[0 ,1 ,2, 3],
... [4, 5, 6, 7]]])

To find the number of dimensions of the array, run:

>>> array_example.ndim
3

To find the total number of elements in the array, run:

>>> array_example.size
24

And to find the shape of your array, run:

>>> array_example.shape
(3, 2, 4)

34 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

3.11 Can you reshape an array?

This section covers [Link]()

Yes!
Using [Link]() will give a new shape to an array without changing the data. Just remember that when you use
the reshape method, the array you want to produce needs to have the same number of elements as the original array. If
you start with an array with 12 elements, you’ll need to make sure that your new array also has a total of 12 elements.
If you start with this array:

>>> a = [Link](6)
>>> print(a)
[0 1 2 3 4 5]

You can use reshape() to reshape your array. For example, you can reshape this array to an array with three rows and
two columns:

>>> b = [Link](3, 2)
>>> print(b)
[[0 1]
[2 3]
[4 5]]

With [Link], you can specify a few optional parameters:

>>> [Link](a, newshape=(1, 6), order='C')


array([[0, 1, 2, 3, 4, 5]])

a is the array to be reshaped.


newshape is the new shape you want. You can specify an integer or a tuple of integers. If you specify an integer, the
result will be an array of that length. The shape should be compatible with the original shape.
order: C means to read/write the elements using C-like index order, F means to read/write the elements using Fortran-
like index order, A means to read/write the elements in Fortran-like index order if a is Fortran contiguous in memory,
C-like order otherwise. (This is an optional parameter and doesn’t need to be specified.)
If you want to learn more about C and Fortran order, you can read more about the internal organization of NumPy arrays
here. Essentially, C and Fortran orders have to do with how indices correspond to the order the array is stored in memory.
In Fortran, when moving through the elements of a two-dimensional array as it is stored in memory, the first index is the
most rapidly varying index. As the first index moves to the next row as it changes, the matrix is stored one column at a
time. This is why Fortran is thought of as a Column-major language. In C on the other hand, the last index changes the
most rapidly. The matrix is stored by rows, making it a Row-major language. What you do for C or Fortran depends
on whether it’s more important to preserve the indexing convention or not reorder the data.
Learn more about shape manipulation here.

3.11. Can you reshape an array? 35


NumPy User Guide, Release 1.22.0

3.12 How to convert a 1D array into a 2D array (how to add a new axis
to an array)

This section covers [Link], np.expand_dims

You can use [Link] and np.expand_dims to increase the dimensions of your existing array.
Using [Link] will increase the dimensions of your array by one dimension when used once. This means that a
1D array will become a 2D array, a 2D array will become a 3D array, and so on.
For example, if you start with this array:

>>> a = [Link]([1, 2, 3, 4, 5, 6])


>>> [Link]
(6,)

You can use [Link] to add a new axis:

>>> a2 = a[[Link], :]
>>> [Link]
(1, 6)

You can explicitly convert a 1D array with either a row vector or a column vector using [Link]. For example,
you can convert a 1D array to a row vector by inserting an axis along the first dimension:

>>> row_vector = a[[Link], :]


>>> row_vector.shape
(1, 6)

Or, for a column vector, you can insert an axis along the second dimension:

>>> col_vector = a[:, [Link]]


>>> col_vector.shape
(6, 1)

You can also expand an array by inserting a new axis at a specified position with np.expand_dims.
For example, if you start with this array:

>>> a = [Link]([1, 2, 3, 4, 5, 6])


>>> [Link]
(6,)

You can use np.expand_dims to add an axis at index position 1 with:

>>> b = np.expand_dims(a, axis=1)


>>> [Link]
(6, 1)

You can add an axis at index position 0 with:

>>> c = np.expand_dims(a, axis=0)


>>> [Link]
(1, 6)

Find more information about newaxis here and expand_dims at expand_dims.

36 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

3.13 Indexing and slicing

You can index and slice NumPy arrays in the same ways you can slice Python lists.

>>> data = [Link]([1, 2, 3])

>>> data[1]
2
>>> data[0:2]
array([1, 2])
>>> data[1:]
array([2, 3])
>>> data[-2:]
array([2, 3])

You can visualize it this way:

You may want to take a section of your array or specific array elements to use in further analysis or additional operations.
To do that, you’ll need to subset, slice, and/or index your arrays.
If you want to select values from your array that fulfill certain conditions, it’s straightforward with NumPy.
For example, if you start with this array:

>>> a = [Link]([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

You can easily print all of the values in the array that are less than 5.

>>> print(a[a < 5])


[1 2 3 4]

You can also select, for example, numbers that are equal to or greater than 5, and use that condition to index an array.

>>> five_up = (a >= 5)


>>> print(a[five_up])
[ 5 6 7 8 9 10 11 12]

You can select elements that are divisible by 2:

>>> divisible_by_2 = a[a%2==0]


>>> print(divisible_by_2)
[ 2 4 6 8 10 12]

Or you can select elements that satisfy two conditions using the & and | operators:

3.13. Indexing and slicing 37


NumPy User Guide, Release 1.22.0

>>> c = a[(a > 2) & (a < 11)]


>>> print(c)
[ 3 4 5 6 7 8 9 10]

You can also make use of the logical operators & and | in order to return boolean values that specify whether or not the
values in an array fulfill a certain condition. This can be useful with arrays that contain names or other categorical values.

>>> five_up = (a > 5) | (a == 5)


>>> print(five_up)
[[False False False False]
[ True True True True]
[ True True True True]]

You can also use [Link]() to select elements or indices from an array.
Starting with this array:

>>> a = [Link]([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

You can use [Link]() to print the indices of elements that are, for example, less than 5:

>>> b = [Link](a < 5)


>>> print(b)
(array([0, 0, 0, 0]), array([0, 1, 2, 3]))

In this example, a tuple of arrays was returned: one for each dimension. The first array represents the row indices where
these values are found, and the second array represents the column indices where the values are found.
If you want to generate a list of coordinates where the elements exist, you can zip the arrays, iterate over the list of
coordinates, and print them. For example:

>>> list_of_coordinates= list(zip(b[0], b[1]))

>>> for coord in list_of_coordinates:


... print(coord)
(0, 0)
(0, 1)
(0, 2)
(0, 3)

You can also use [Link]() to print the elements in an array that are less than 5 with:

>>> print(a[b])
[1 2 3 4]

If the element you’re looking for doesn’t exist in the array, then the returned array of indices will be empty. For example:

>>> not_there = [Link](a == 42)


>>> print(not_there)
(array([], dtype=int64), array([], dtype=int64))

Learn more about indexing and slicing here and here.


Read more about using the nonzero function at: nonzero.

38 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

3.14 How to create an array from existing data

This section covers slicing and indexing, [Link](), [Link](), [Link](), .view(),
copy()

You can easily create a new array from a section of an existing array.
Let’s say you have this array:

>>> a = [Link]([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

You can create a new array from a section of your array any time by specifying where you want to slice your array.

>>> arr1 = a[3:8]


>>> arr1
array([4, 5, 6, 7, 8])

Here, you grabbed a section of your array from index position 3 through index position 8.
You can also stack two existing arrays, both vertically and horizontally. Let’s say you have two arrays, a1 and a2:

>>> a1 = [Link]([[1, 1],


... [2, 2]])

>>> a2 = [Link]([[3, 3],


... [4, 4]])

You can stack them vertically with vstack:

>>> [Link]((a1, a2))


array([[1, 1],
[2, 2],
[3, 3],
[4, 4]])

Or stack them horizontally with hstack:

>>> [Link]((a1, a2))


array([[1, 1, 3, 3],
[2, 2, 4, 4]])

You can split an array into several smaller arrays using hsplit. You can specify either the number of equally shaped
arrays to return or the columns after which the division should occur.
Let’s say you have this array:

>>> x = [Link](1, 25).reshape(2, 12)


>>> x
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]])

If you wanted to split this array into three equally shaped arrays, you would run:

>>> [Link](x, 3)
[array([[1, 2, 3, 4],
[13, 14, 15, 16]]), array([[ 5, 6, 7, 8],
(continues on next page)

3.14. How to create an array from existing data 39


NumPy User Guide, Release 1.22.0

(continued from previous page)


[17, 18, 19, 20]]), array([[ 9, 10, 11, 12],
[21, 22, 23, 24]])]

If you wanted to split your array after the third and fourth column, you’d run:

>>> [Link](x, (3, 4))


[array([[1, 2, 3],
[13, 14, 15]]), array([[ 4],
[16]]), array([[ 5, 6, 7, 8, 9, 10, 11, 12],
[17, 18, 19, 20, 21, 22, 23, 24]])]

Learn more about stacking and splitting arrays here.


You can use the view method to create a new array object that looks at the same data as the original array (a shallow
copy).
Views are an important NumPy concept! NumPy functions, as well as operations like indexing and slicing, will return
views whenever possible. This saves memory and is faster (no copy of the data has to be made). However it’s important
to be aware of this - modifying data in a view also modifies the original array!
Let’s say you create this array:

>>> a = [Link]([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

Now we create an array b1 by slicing a and modify the first element of b1. This will modify the corresponding element
in a as well!

>>> b1 = a[0, :]
>>> b1
array([1, 2, 3, 4])
>>> b1[0] = 99
>>> b1
array([99, 2, 3, 4])
>>> a
array([[99, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])

Using the copy method will make a complete copy of the array and its data (a deep copy). To use this on your array, you
could run:

>>> b2 = [Link]()

Learn more about copies and views here.

3.15 Basic array operations

This section covers addition, subtraction, multiplication, division, and more

Once you’ve created your arrays, you can start to work with them. Let’s say, for example, that you’ve created two arrays,
one called “data” and one called “ones”

40 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

You can add the arrays together with the plus sign.

>>> data = [Link]([1, 2])


>>> ones = [Link](2, dtype=int)
>>> data + ones
array([2, 3])

You can, of course, do more than just addition!

>>> data - ones


array([0, 1])
>>> data * data
array([1, 4])
>>> data / data
array([1., 1.])

Basic operations are simple with NumPy. If you want to find the sum of the elements in an array, you’d use sum(). This
works for 1D arrays, 2D arrays, and arrays in higher dimensions.

>>> a = [Link]([1, 2, 3, 4])

>>> [Link]()
10

To add the rows or the columns in a 2D array, you would specify the axis.
If you start with this array:

3.15. Basic array operations 41


NumPy User Guide, Release 1.22.0

>>> b = [Link]([[1, 1], [2, 2]])

You can sum over the axis of rows with:

>>> [Link](axis=0)
array([3, 3])

You can sum over the axis of columns with:

>>> [Link](axis=1)
array([2, 4])

Learn more about basic operations here.

3.16 Broadcasting

There are times when you might want to carry out an operation between an array and a single number (also called an
operation between a vector and a scalar) or between arrays of two different sizes. For example, your array (we’ll call it
“data”) might contain information about distance in miles but you want to convert the information to kilometers. You can
perform this operation with:

>>> data = [Link]([1.0, 2.0])


>>> data * 1.6
array([1.6, 3.2])

NumPy understands that the multiplication should happen with each cell. That concept is called broadcasting. Broad-
casting is a mechanism that allows NumPy to perform operations on arrays of different shapes. The dimensions of your
array must be compatible, for example, when the dimensions of both arrays are equal or when one of them is 1. If the
dimensions are not compatible, you will get a ValueError.
Learn more about broadcasting here.

3.17 More useful array operations

This section covers maximum, minimum, sum, mean, product, standard deviation, and more

NumPy also performs aggregation functions. In addition to min, max, and sum, you can easily run mean to get the
average, prod to get the result of multiplying the elements together, std to get the standard deviation, and more.

42 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

>>> [Link]()
2.0
>>> [Link]()
1.0
>>> [Link]()
3.0

Let’s start with this array, called “a”

>>> a = [Link]([[0.45053314, 0.17296777, 0.34376245, 0.5510652],


... [0.54627315, 0.05093587, 0.40067661, 0.55645993],
... [0.12697628, 0.82485143, 0.26590556, 0.56917101]])

It’s very common to want to aggregate along a row or column. By default, every NumPy aggregation function will return
the aggregate of the entire array. To find the sum or the minimum of the elements in your array, run:

>>> [Link]()
4.8595784

Or:

>>> [Link]()
0.05093587

You can specify on which axis you want the aggregation function to be computed. For example, you can find the minimum
value within each column by specifying axis=0.

>>> [Link](axis=0)
array([0.12697628, 0.05093587, 0.26590556, 0.5510652 ])

The four values listed above correspond to the number of columns in your array. With a four-column array, you will get
four values as your result.
Read more about array methods here.

3.18 Creating matrices

You can pass Python lists of lists to create a 2-D array (or “matrix”) to represent them in NumPy.

>>> data = [Link]([[1, 2], [3, 4], [5, 6]])


>>> data
array([[1, 2],
[3, 4],
[5, 6]])

3.18. Creating matrices 43


NumPy User Guide, Release 1.22.0

Indexing and slicing operations are useful when you’re manipulating matrices:

>>> data[0, 1]
2
>>> data[1:3]
array([[3, 4],
[5, 6]])
>>> data[0:2, 0]
array([1, 3])

You can aggregate matrices the same way you aggregated vectors:

>>> [Link]()
6
>>> [Link]()
1
>>> [Link]()
21

You can aggregate all the values in a matrix and you can aggregate them across columns or rows using the axis parameter.
To illustrate this point, let’s look at a slightly modified dataset:

>>> data = [Link]([[1, 2], [5, 3], [4, 6]])


>>> data
array([[1, 2],
(continues on next page)

44 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

(continued from previous page)


[5, 3],
[4, 6]])
>>> [Link](axis=0)
array([5, 6])
>>> [Link](axis=1)
array([2, 5, 6])

Once you’ve created your matrices, you can add and multiply them using arithmetic operators if you have two matrices
that are the same size.

>>> data = [Link]([[1, 2], [3, 4]])


>>> ones = [Link]([[1, 1], [1, 1]])
>>> data + ones
array([[2, 3],
[4, 5]])

You can do these arithmetic operations on matrices of different sizes, but only if one matrix has only one column or one
row. In this case, NumPy will use its broadcast rules for the operation.

>>> data = [Link]([[1, 2], [3, 4], [5, 6]])


>>> ones_row = [Link]([[1, 1]])
>>> data + ones_row
array([[2, 3],
[4, 5],
[6, 7]])

3.18. Creating matrices 45


NumPy User Guide, Release 1.22.0

Be aware that when NumPy prints N-dimensional arrays, the last axis is looped over the fastest while the first axis is the
slowest. For instance:

>>> [Link]((4, 3, 2))


array([[[1., 1.],
[1., 1.],
[1., 1.]],

[[1., 1.],
[1., 1.],
[1., 1.]],

[[1., 1.],
[1., 1.],
[1., 1.]],

[[1., 1.],
[1., 1.],
[1., 1.]]])

There are often instances where we want NumPy to initialize the values of an array. NumPy offers functions like ones()
and zeros(), and the [Link] class for random number generation for that. All you need to do is pass
in the number of elements you want it to generate:

>>> [Link](3)
array([1., 1., 1.])
>>> [Link](3)
array([0., 0., 0.])
# the simplest way to generate random numbers
>>> rng = [Link].default_rng(0)
>>> [Link](3)
array([0.63696169, 0.26978671, 0.04097352])

46 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

You can also use ones(), zeros(), and random() to create a 2D array if you give them a tuple describing the
dimensions of the matrix:

>>> [Link]((3, 2))


array([[1., 1.],
[1., 1.],
[1., 1.]])
>>> [Link]((3, 2))
array([[0., 0.],
[0., 0.],
[0., 0.]])
>>> [Link]((3, 2))
array([[0.01652764, 0.81327024],
[0.91275558, 0.60663578],
[0.72949656, 0.54362499]]) # may vary

Read more about creating arrays, filled with 0’s, 1’s, other values or uninitialized, at array creation routines.

3.18. Creating matrices 47


NumPy User Guide, Release 1.22.0

3.19 Generating random numbers

The use of random number generation is an important part of the configuration and evaluation of many numerical and
machine learning algorithms. Whether you need to randomly initialize weights in an artificial neural network, split data
into random sets, or randomly shuffle your dataset, being able to generate random numbers (actually, repeatable pseudo-
random numbers) is essential.
With [Link], you can generate random integers from low (remember that this is inclusive with
NumPy) to high (exclusive). You can set endpoint=True to make the high number inclusive.
You can generate a 2 x 4 array of random integers between 0 and 4 with:

>>> [Link](5, size=(2, 4))


array([[2, 1, 1, 0],
[0, 0, 0, 4]]) # may vary

Read more about random number generation here.

3.20 How to get unique items and counts

This section covers [Link]()

You can find the unique elements in an array easily with [Link].
For example, if you start with this array:

>>> a = [Link]([11, 11, 12, 13, 14, 15, 16, 17, 12, 13, 11, 14, 18, 19, 20])

you can use [Link] to print the unique values in your array:

>>> unique_values = [Link](a)


>>> print(unique_values)
[11 12 13 14 15 16 17 18 19 20]

To get the indices of unique values in a NumPy array (an array of first index positions of unique values in the array), just
pass the return_index argument in [Link]() as well as your array.

>>> unique_values, indices_list = [Link](a, return_index=True)


>>> print(indices_list)
[ 0 2 3 4 5 6 7 12 13 14]

You can pass the return_counts argument in [Link]() along with your array to get the frequency count of
unique values in a NumPy array.

>>> unique_values, occurrence_count = [Link](a, return_counts=True)


>>> print(occurrence_count)
[3 2 2 2 1 1 1 1 1 1]

This also works with 2D arrays! If you start with this array:

>>> a_2d = [Link]([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [1, 2, 3, 4]])

You can find unique values with:

48 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

>>> unique_values = [Link](a_2d)


>>> print(unique_values)
[ 1 2 3 4 5 6 7 8 9 10 11 12]

If the axis argument isn’t passed, your 2D array will be flattened.


If you want to get the unique rows or columns, make sure to pass the axis argument. To find the unique rows, specify
axis=0 and for columns, specify axis=1.

>>> unique_rows = [Link](a_2d, axis=0)


>>> print(unique_rows)
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]

To get the unique rows, index position, and occurrence count, you can use:

>>> unique_rows, indices, occurrence_count = [Link](


... a_2d, axis=0, return_counts=True, return_index=True)
>>> print(unique_rows)
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
>>> print(indices)
[0 1 2]
>>> print(occurrence_count)
[2 1 1]

To learn more about finding the unique elements in an array, see unique.

3.21 Transposing and reshaping a matrix

This section covers [Link](), [Link](), arr.T

It’s common to need to transpose your matrices. NumPy arrays have the property T that allows you to transpose a matrix.

You may also need to switch the dimensions of a matrix. This can happen when, for example, you have a model that
expects a certain input shape that is different from your dataset. This is where the reshape method can be useful. You
simply need to pass in the new dimensions that you want for the matrix.

3.21. Transposing and reshaping a matrix 49


NumPy User Guide, Release 1.22.0

>>> [Link](2, 3)
array([[1, 2, 3],
[4, 5, 6]])
>>> [Link](3, 2)
array([[1, 2],
[3, 4],
[5, 6]])

You can also use .transpose() to reverse or change the axes of an array according to the values you specify.
If you start with this array:

>>> arr = [Link](6).reshape((2, 3))


>>> arr
array([[0, 1, 2],
[3, 4, 5]])

You can transpose your array with [Link]().

>>> [Link]()
array([[0, 3],
[1, 4],
[2, 5]])

You can also use arr.T:

>>> arr.T
array([[0, 3],
[1, 4],
[2, 5]])

To learn more about transposing and reshaping arrays, see transpose and reshape.

50 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

3.22 How to reverse an array

This section covers [Link]()

NumPy’s [Link]() function allows you to flip, or reverse, the contents of an array along an axis. When using np.
flip(), specify the array you would like to reverse and the axis. If you don’t specify the axis, NumPy will reverse the
contents along all of the axes of your input array.
Reversing a 1D array
If you begin with a 1D array like this one:

>>> arr = [Link]([1, 2, 3, 4, 5, 6, 7, 8])

You can reverse it with:

>>> reversed_arr = [Link](arr)

If you want to print your reversed array, you can run:

>>> print('Reversed Array: ', reversed_arr)


Reversed Array: [8 7 6 5 4 3 2 1]

Reversing a 2D array
A 2D array works much the same way.
If you start with this array:

>>> arr_2d = [Link]([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

You can reverse the content in all of the rows and all of the columns with:

>>> reversed_arr = [Link](arr_2d)


>>> print(reversed_arr)
[[12 11 10 9]
[ 8 7 6 5]
[ 4 3 2 1]]

You can easily reverse only the rows with:

>>> reversed_arr_rows = [Link](arr_2d, axis=0)


>>> print(reversed_arr_rows)
[[ 9 10 11 12]
[ 5 6 7 8]
[ 1 2 3 4]]

Or reverse only the columns with:

>>> reversed_arr_columns = [Link](arr_2d, axis=1)


>>> print(reversed_arr_columns)
[[ 4 3 2 1]
[ 8 7 6 5]
[12 11 10 9]]

You can also reverse the contents of only one column or row. For example, you can reverse the contents of the row at
index position 1 (the second row):

3.22. How to reverse an array 51


NumPy User Guide, Release 1.22.0

>>> arr_2d[1] = [Link](arr_2d[1])


>>> print(arr_2d)
[[ 1 2 3 4]
[ 8 7 6 5]
[ 9 10 11 12]]

You can also reverse the column at index position 1 (the second column):

>>> arr_2d[:,1] = [Link](arr_2d[:,1])


>>> print(arr_2d)
[[ 1 10 3 4]
[ 8 7 6 5]
[ 9 2 11 12]]

Read more about reversing arrays at flip.

3.23 Reshaping and flattening multidimensional arrays

This section covers .flatten(), ravel()

There are two popular ways to flatten an array: .flatten() and .ravel(). The primary difference between the
two is that the new array created using ravel() is actually a reference to the parent array (i.e., a “view”). This means
that any changes to the new array will affect the parent array as well. Since ravel does not create a copy, it’s memory
efficient.
If you start with this array:

>>> x = [Link]([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

You can use flatten to flatten your array into a 1D array.

>>> [Link]()
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

When you use flatten, changes to your new array won’t change the parent array.
For example:

>>> a1 = [Link]()
>>> a1[0] = 99
>>> print(x) # Original array
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
>>> print(a1) # New array
[99 2 3 4 5 6 7 8 9 10 11 12]

But when you use ravel, the changes you make to the new array will affect the parent array.
For example:

>>> a2 = [Link]()
>>> a2[0] = 98
>>> print(x) # Original array
(continues on next page)

52 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

(continued from previous page)


[[98 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
>>> print(a2) # New array
[98 2 3 4 5 6 7 8 9 10 11 12]

Read more about flatten at [Link] and ravel at ravel.

3.24 How to access the docstring for more information

This section covers help(), ?, ??

When it comes to the data science ecosystem, Python and NumPy are built with the user in mind. One of the best
examples of this is the built-in access to documentation. Every object contains the reference to a string, which is known
as the docstring. In most cases, this docstring contains a quick and concise summary of the object and how to use it.
Python has a built-in help() function that can help you access this information. This means that nearly any time you
need more information, you can use help() to quickly find the information that you need.
For example:

>>> help(max)
Help on built-in function max in module builtins:

max(...)
max(iterable, *[, default=obj, key=func]) -> value
max(arg1, arg2, *args, *[, key=func]) -> value

With a single iterable argument, return its biggest item. The


default keyword-only argument specifies an object to return if
the provided iterable is empty.
With two or more arguments, return the largest argument.

Because access to additional information is so useful, IPython uses the ? character as a shorthand for accessing this
documentation along with other relevant information. IPython is a command shell for interactive computing in multiple
languages. You can find more information about IPython here.
For example:

In [0]: max?
max(iterable, *[, default=obj, key=func]) -> value
max(arg1, arg2, *args, *[, key=func]) -> value

With a single iterable argument, return its biggest item. The


default keyword-only argument specifies an object to return if
the provided iterable is empty.
With two or more arguments, return the largest argument.
Type: builtin_function_or_method

You can even use this notation for object methods and objects themselves.
Let’s say you create this array:

>>> a = [Link]([1, 2, 3, 4, 5, 6])

3.24. How to access the docstring for more information 53


NumPy User Guide, Release 1.22.0

Then you can obtain a lot of useful information (first details about a itself, followed by the docstring of ndarray of
which a is an instance):

In [1]: a?
Type: ndarray
String form: [1 2 3 4 5 6]
Length: 6
File: ~/anaconda3/lib/python3.7/site-packages/numpy/__init__.py
Docstring: <no docstring>
Class docstring:
ndarray(shape, dtype=float, buffer=None, offset=0,
strides=None, order=None)

An array object represents a multidimensional, homogeneous array


of fixed-size items. An associated data-type object describes the
format of each element in the array (its byte-order, how many bytes it
occupies in memory, whether it is an integer, a floating point number,
or something else, etc.)

Arrays should be constructed using `array`, `zeros` or `empty` (refer


to the See Also section below). The parameters given here refer to
a low-level method (`ndarray(...)`) for instantiating an array.

For more information, refer to the `numpy` module and examine the
methods and attributes of an array.

Parameters
----------
(for the __new__ method; see Notes below)

shape : tuple of ints


Shape of created array.
...

This also works for functions and other objects that you create. Just remember to include a docstring with your function
using a string literal (""" """ or ''' ''' around your documentation).
For example, if you create this function:

>>> def double(a):


... '''Return a * 2'''
... return a * 2

You can obtain information about the function:

In [2]: double?
Signature: double(a)
Docstring: Return a * 2
File: ~/Desktop/<ipython-input-23-b5adf20be596>
Type: function

You can reach another level of information by reading the source code of the object you’re interested in. Using a double
question mark (??) allows you to access the source code.
For example:

In [3]: double??
Signature: double(a)
(continues on next page)

54 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

(continued from previous page)


Source:
def double(a):
'''Return a * 2'''
return a * 2
File: ~/Desktop/<ipython-input-23-b5adf20be596>
Type: function

If the object in question is compiled in a language other than Python, using ?? will return the same information as ?.
You’ll find this with a lot of built-in objects and types, for example:

In [4]: len?
Signature: len(obj, /)
Docstring: Return the number of items in a container.
Type: builtin_function_or_method

and :

In [5]: len??
Signature: len(obj, /)
Docstring: Return the number of items in a container.
Type: builtin_function_or_method

have the same output because they were compiled in a programming language other than Python.

3.25 Working with mathematical formulas

The ease of implementing mathematical formulas that work on arrays is one of the things that make NumPy so widely
used in the scientific Python community.
For example, this is the mean square error formula (a central formula used in supervised machine learning models that
deal with regression):

Implementing this formula is simple and straightforward in NumPy:

What makes this work so well is that predictions and labels can contain one or a thousand values. They only
need to be the same size.

3.25. Working with mathematical formulas 55


NumPy User Guide, Release 1.22.0

You can visualize it this way:

In this example, both the predictions and labels vectors contain three values, meaning n has a value of three. After we
carry out subtractions the values in the vector are squared. Then NumPy sums the values, and your result is the error
value for that prediction and a score for the quality of the model.

3.26 How to save and load NumPy objects

This section covers [Link], [Link], [Link], [Link], [Link]

You will, at some point, want to save your arrays to disk and load them back without having to re-run the code. Fortunately,
there are several ways to save and load objects with NumPy. The ndarray objects can be saved to and loaded from the
disk files with loadtxt and savetxt functions that handle normal text files, load and save functions that handle
NumPy binary files with a .npy file extension, and a savez function that handles NumPy files with a .npz file extension.
The .npy and .npz files store data, shape, dtype, and other information required to reconstruct the ndarray in a way that
allows the array to be correctly retrieved, even when the file is on another machine with different architecture.
If you want to store a single ndarray object, store it as a .npy file using [Link]. If you want to store more than one
ndarray object in a single file, save it as a .npz file using [Link]. You can also save several arrays into a single file in
compressed npz format with savez_compressed.
It’s easy to save and load and array with [Link](). Just make sure to specify the array you want to save and a file
name. For example, if you create this array:

>>> a = [Link]([1, 2, 3, 4, 5, 6])

56 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

You can save it as “[Link]” with:

>>> [Link]('filename', a)

You can use [Link]() to reconstruct your array.

>>> b = [Link]('[Link]')

If you want to check your array, you can run::

>>> print(b)
[1 2 3 4 5 6]

You can save a NumPy array as a plain text file like a .csv or .txt file with [Link].
For example, if you create this array:

>>> csv_arr = [Link]([1, 2, 3, 4, 5, 6, 7, 8])

You can easily save it as a .csv file with the name “new_file.csv” like this:

>>> [Link]('new_file.csv', csv_arr)

You can quickly and easily load your saved text file using loadtxt():

>>> [Link]('new_file.csv')
array([1., 2., 3., 4., 5., 6., 7., 8.])

The savetxt() and loadtxt() functions accept additional optional parameters such as header, footer, and delimiter.
While text files can be easier for sharing, .npy and .npz files are smaller and faster to read. If you need more sophisticated
handling of your text file (for example, if you need to work with lines that contain missing values), you will want to use
the genfromtxt function.
With savetxt, you can specify headers, footers, comments, and more.
Learn more about input and output routines here.

3.27 Importing and exporting a CSV

It’s simple to read in a CSV that contains existing information. The best and easiest way to do this is to use Pandas.

>>> import pandas as pd

>>> # If all of your columns are the same type:


>>> x = pd.read_csv('[Link]', header=0).values
>>> print(x)
[['Billie Holiday' 'Jazz' 1300000 27000000]
['Jimmie Hendrix' 'Rock' 2700000 70000000]
['Miles Davis' 'Jazz' 1500000 48000000]
['SIA' 'Pop' 2000000 74000000]]

>>> # You can also simply select the columns you need:
>>> x = pd.read_csv('[Link]', usecols=['Artist', 'Plays']).values
>>> print(x)
[['Billie Holiday' 27000000]
['Jimmie Hendrix' 70000000]
(continues on next page)

3.27. Importing and exporting a CSV 57


NumPy User Guide, Release 1.22.0

(continued from previous page)


['Miles Davis' 48000000]
['SIA' 74000000]]

It’s simple to use Pandas in order to export your array as well. If you are new to NumPy, you may want to create a Pandas
dataframe from the values in your array and then write the data frame to a CSV file with Pandas.
If you created this array “a”

>>> a = [Link]([[-2.58289208, 0.43014843, -1.24082018, 1.59572603],


... [ 0.99027828, 1.17150989, 0.94125714, -0.14692469],
... [ 0.76989341, 0.81299683, -0.95068423, 0.11769564],
... [ 0.20484034, 0.34784527, 1.96979195, 0.51992837]])

You could create a Pandas dataframe

>>> df = [Link](a)
>>> print(df)
0 1 2 3
0 -2.582892 0.430148 -1.240820 1.595726
1 0.990278 1.171510 0.941257 -0.146925
2 0.769893 0.812997 -0.950684 0.117696
3 0.204840 0.347845 1.969792 0.519928

You can easily save your dataframe with:

>>> df.to_csv('[Link]')

And read your CSV with:

>>> data = pd.read_csv('[Link]')

You can also save your array with the NumPy savetxt method.

>>> [Link]('[Link]', a, fmt='%.2f', delimiter=',', header='1, 2, 3, 4')

If you’re using the command line, you can read your saved CSV any time with a command such as:

58 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

$ cat [Link]
# 1, 2, 3, 4
-2.58,0.43,-1.24,1.60
0.99,1.17,0.94,-0.15
0.77,0.81,-0.95,0.12
0.20,0.35,1.97,0.52

Or you can open the file any time with a text editor!
If you’re interested in learning more about Pandas, take a look at the official Pandas documentation. Learn how to install
Pandas with the official Pandas installation information.

3.28 Plotting arrays with Matplotlib

If you need to generate a plot for your values, it’s very simple with Matplotlib.
For example, you may have an array like this one:

>>> a = [Link]([2, 1, 5, 7, 4, 6, 8, 14, 10, 9, 18, 20, 22])

If you already have Matplotlib installed, you can import it with:

>>> import [Link] as plt

# If you're using Jupyter Notebook, you may also want to run the following
# line of code to display your code in the notebook:

%matplotlib inline

All you need to do to plot your values is run:

>>> [Link](a)

# If you are running from a command line, you may need to do this:
# >>> [Link]()

20

15

10

0
0 2 4 6 8 10 12

3.28. Plotting arrays with Matplotlib 59


NumPy User Guide, Release 1.22.0

For example, you can plot a 1D array like this:

>>> x = [Link](0, 5, 20)


>>> y = [Link](0, 10, 20)
>>> [Link](x, y, 'purple') # line
>>> [Link](x, y, 'o') # dots

10
8
6
4
2
0
0 1 2 3 4 5

With Matplotlib, you have access to an enormous number of visualization options.

>>> fig = [Link]()


>>> ax = fig.add_subplot(projection='3d')
>>> X = [Link](-5, 5, 0.15)
>>> Y = [Link](-5, 5, 0.15)
>>> X, Y = [Link](X, Y)
>>> R = [Link](X**2 + Y**2)
>>> Z = [Link](R)

>>> ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='viridis')

To read more about Matplotlib and what it can do, take a look at the official documentation. For directions regarding
installing Matplotlib, see the official installation section.

Image credits: Jay Alammar [Link]

60 3. NumPy: the absolute basics for beginners


NumPy User Guide, Release 1.22.0

0.5
0.0
0.5

5.0
2.5
5.0 2.5 0.0
0.0 2.5 2.5
5.0 5.0

3.28. Plotting arrays with Matplotlib 61


NumPy User Guide, Release 1.22.0

62 3. NumPy: the absolute basics for beginners


CHAPTER

FOUR

NUMPY FUNDAMENTALS

These documents clarify concepts, design decisions, and technical constraints in NumPy. This is a great place to under-
stand the fundamental NumPy ideas and philosophy.

4.1 Array creation

See also:
Array creation routines

4.1.1 Introduction

There are 6 general mechanisms for creating arrays:


1) Conversion from other Python structures (i.e. lists and tuples)
2) Intrinsic NumPy array creation functions (e.g. arange, ones, zeros, etc.)
3) Replicating, joining, or mutating existing arrays
4) Reading arrays from disk, either from standard or custom formats
5) Creating arrays from raw bytes through the use of strings or buffers
6) Use of special library functions (e.g., random)
You can use these methods to create ndarrays or Structured arrays. This document will cover general methods for ndarray
creation.

4.1.2 1) Converting Python sequences to NumPy Arrays

NumPy arrays can be defined using Python sequences such as lists and tuples. Lists and tuples are defined using [...]
and (...), respectively. Lists and tuples can define ndarray creation:
• a list of numbers will create a 1D array,
• a list of lists will create a 2D array,
• further nested lists will create higher-dimensional arrays. In general, any array object is called an ndarray in
NumPy.

>>> a1D = [Link]([1, 2, 3, 4])


>>> a2D = [Link]([[1, 2], [3, 4]])
>>> a3D = [Link]([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

63
NumPy User Guide, Release 1.22.0

When you use [Link] to define a new array, you should consider the dtype of the elements in the array, which
can be specified explicitly. This feature gives you more control over the underlying data structures and how the elements
are handled in C/C++ functions. If you are not careful with dtype assignments, you can get unwanted overflow, as such

>>> a = [Link]([127, 128, 129], dtype=np.int8)


>>> a
array([ 127, -128, -127], dtype=int8)

An 8-bit signed integer represents integers from -128 to 127. Assigning the int8 array to integers outside of this range
results in overflow. This feature can often be misunderstood. If you perform calculations with mismatching dtypes,
you can get unwanted results, for example:

>>> a = [Link]([2, 3, 4], dtype=np.uint32)


>>> b = [Link]([5, 6, 7], dtype=np.uint32)
>>> c_unsigned32 = a - b
>>> print('unsigned c:', c_unsigned32, c_unsigned32.dtype)
unsigned c: [4294967293 4294967293 4294967293] uint32
>>> c_signed32 = a - [Link](np.int32)
>>> print('signed c:', c_signed32, c_signed32.dtype)
signed c: [-3 -3 -3] int64

Notice when you perform operations with two arrays of the same dtype: uint32, the resulting array is the same type.
When you perform operations with different dtype, NumPy will assign a new type that satisfies all of the array elements
involved in the computation, here uint32 and int32 can both be represented in as int64.
The default NumPy behavior is to create arrays in either 64-bit signed integers or double precision floating point numbers,
int64 and float, respectively. If you expect your arrays to be a certain type, then you need to specify the dtype
while you create the array.

4.1.3 2) Intrinsic NumPy array creation functions

NumPy has over 40 built-in functions for creating arrays as laid out in the Array creation routines. These functions can
be split into roughly three categories, based on the dimension of the array they create:
1) 1D arrays
2) 2D arrays
3) ndarrays

1 - 1D array creation functions

The 1D array creation functions e.g. [Link] and [Link] generally need at least two inputs,
start and stop.
[Link] creates arrays with regularly incrementing values. Check the documentation for complete information
and examples. A few examples are shown:

>>> [Link](10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> [Link](2, 10, dtype=float)
array([ 2., 3., 4., 5., 6., 7., 8., 9.])
>>> [Link](2, 3, 0.1)
array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

64 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

Note: best practice for [Link] is to use integer start, end, and step values. There are some subtleties regarding
dtype. In the second example, the dtype is defined. In the third example, the array is dtype=float to accommodate
the step size of 0.1. Due to roundoff error, the stop value is sometimes included.
[Link] will create arrays with a specified number of elements, and spaced equally between the specified
beginning and end values. For example:

>>> [Link](1., 4., 6)


array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ])

The advantage of this creation function is that you guarantee the number of elements and the starting and end point. The
previous arange(start, stop, step) will not include the value stop.

2 - 2D array creation functions

The 2D array creation functions e.g. [Link], [Link], and [Link] define properties of special
matrices represented as 2D arrays.
[Link](n, m) defines a 2D identity matrix. The elements where i=j (row index and column index are equal) are 1
and the rest are 0, as such:

>>> [Link](3)
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
>>> [Link](3, 5)
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.]])

[Link] can define either a square 2D array with given values along the diagonal or if given a 2D array returns a
1D array that is only the diagonal elements. The two array creation functions can be helpful while doing linear algebra,
as such:

>>> [Link]([1, 2, 3])


array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
>>> [Link]([1, 2, 3], 1)
array([[0, 1, 0, 0],
[0, 0, 2, 0],
[0, 0, 0, 3],
[0, 0, 0, 0]])
>>> a = [Link]([[1, 2], [3, 4]])
>>> [Link](a)
array([1, 4])

vander(x, n) defines a Vandermonde matrix as a 2D NumPy array. Each column of the Vandermonde matrix is a
decreasing power of the input 1D array or list or tuple, x where the highest polynomial order is n-1. This array creation
routine is helpful in generating linear least squares models, as such:

>>> [Link]([Link](0, 2, 5), 2)


array([[0. , 1. ],
[0.5, 1. ],
[1. , 1. ],
[1.5, 1. ],
(continues on next page)

4.1. Array creation 65


NumPy User Guide, Release 1.22.0

(continued from previous page)


[2. , 1. ]])
>>> [Link]([1, 2, 3, 4], 2)
array([[1, 1],
[2, 1],
[3, 1],
[4, 1]])
>>> [Link]((1, 2, 3, 4), 4)
array([[ 1, 1, 1, 1],
[ 8, 4, 2, 1],
[27, 9, 3, 1],
[64, 16, 4, 1]])

3 - general ndarray creation functions

The ndarray creation functions e.g. [Link], [Link], and random define arrays based upon the desired
shape. The ndarray creation functions can create arrays with any dimension by specifying how many dimensions and
length along that dimension in a tuple or list.
[Link] will create an array filled with 0 values with the specified shape. The default dtype is float64:
>>> [Link]((2, 3))
array([[0., 0., 0.],
[0., 0., 0.]])
>>> [Link]((2, 3, 2))
array([[[0., 0.],
[0., 0.],
[0., 0.]],

[[0., 0.],
[0., 0.],
[0., 0.]]])

[Link] will create an array filled with 1 values. It is identical to zeros in all other respects as such:
>>> [Link]((2, 3))
array([[ 1., 1., 1.],
[ 1., 1., 1.]])
>>> [Link]((2, 3, 2))
array([[[1., 1.],
[1., 1.],
[1., 1.]],

[[1., 1.],
[1., 1.],
[1., 1.]]])

The random method of the result of default_rng will create an array filled with random values between 0 and 1. It
is included with the [Link] library. Below, two arrays are created with shapes (2,3) and (2,3,2), respectively.
The seed is set to 42 so you can reproduce these pseudorandom numbers:
>>> from [Link] import default_rng
>>> default_rng(42).random((2,3))
array([[0.77395605, 0.43887844, 0.85859792],
[0.69736803, 0.09417735, 0.97562235]])
>>> default_rng(42).random((2,3,2))
(continues on next page)

66 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

(continued from previous page)


array([[[0.77395605, 0.43887844],
[0.85859792, 0.69736803],
[0.09417735, 0.97562235]],
[[0.7611397 , 0.78606431],
[0.12811363, 0.45038594],
[0.37079802, 0.92676499]]])

[Link] will create a set of arrays (stacked as a one-higher dimensioned array), one per dimension with each
representing variation in that dimension:

>>> [Link]((3,3))
array([[[0, 0, 0],
[1, 1, 1],
[2, 2, 2]],
[[0, 1, 2],
[0, 1, 2],
[0, 1, 2]]])

This is particularly useful for evaluating functions of multiple dimensions on a regular grid.

4.1.4 3) Replicating, joining, or mutating existing arrays

Once you have created arrays, you can replicate, join, or mutate those existing arrays to create new arrays. When you
assign an array or its elements to a new variable, you have to explicitly [Link] the array, otherwise the variable
is a view into the original array. Consider the following example:

>>> a = [Link]([1, 2, 3, 4, 5, 6])


>>> b = a[:2]
>>> b += 1
>>> print('a =', a, '; b =', b)
a = [2 3 3 4 5 6] ; b = [2 3]

In this example, you did not create a new array. You created a variable, b that viewed the first 2 elements of a. When
you added 1 to b you would get the same result by adding 1 to a[:2]. If you want to create a new array, use the
[Link] array creation routine as such:

>>> a = [Link]([1, 2, 3, 4])


>>> b = a[:2].copy()
>>> b += 1
>>> print('a = ', a, 'b = ', b)
a = [1 2 3 4] b = [2 3]

For more information and examples look at Copies and Views.


There are a number of routines to join existing arrays e.g. [Link], [Link], and [Link].
Here is an example of joining four 2-by-2 arrays into a 4-by-4 array using block:

>>> A = [Link]((2, 2))


>>> B = [Link](2, 2)
>>> C = [Link]((2, 2))
>>> D = [Link]((-3, -4))
>>> [Link]([[A, B], [C, D]])
array([[ 1., 1., 1., 0. ],
[ 1., 1., 0., 1. ],
(continues on next page)

4.1. Array creation 67


NumPy User Guide, Release 1.22.0

(continued from previous page)


[ 0., 0., -3., 0. ],
[ 0., 0., 0., -4. ]])

Other routines use similar syntax to join ndarrays. Check the routine’s documentation for further examples and syntax.

4.1.5 4) Reading arrays from disk, either from standard or custom formats

This is the most common case of large array creation. The details depend greatly on the format of data on disk. This
section gives general pointers on how to handle various formats. For more detailed examples of IO look at How to Read
and Write files.

Standard Binary Formats

Various fields have standard formats for array data. The following lists the ones with known Python libraries to read them
and return NumPy arrays (there may be others for which it is possible to read and convert to NumPy arrays so check the
last section as well)

HDF5: h5py
FITS: Astropy

Examples of formats that cannot be read directly but for which it is not hard to convert are those formats supported by
libraries like PIL (able to read and write many image formats such as jpg, png, etc).

Common ASCII Formats

Delimited files such as comma separated value (csv) and tab separated value (tsv) files are used for programs like Excel and
LabView. Python functions can read and parse these files line-by-line. NumPy has two standard routines for importing a
file with delimited data [Link] and [Link]. These functions have more involved use cases
in Reading and writing files. A simple example given a [Link]:

$ cat [Link]
x, y
0, 0
1, 1
2, 4
3, 9

Importing [Link] is accomplished using loadtxt:

>>> [Link]('[Link]', delimiter = ',', skiprows = 1)


array([[0., 0.],
[1., 1.],
[2., 4.],
[3., 9.]])

More generic ASCII files can be read using [Link] and Pandas.

68 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

4.1.6 5) Creating arrays from raw bytes through the use of strings or buffers

There are a variety of approaches one can use. If the file has a relatively simple format then one can write a simple I/O
library and use the NumPy fromfile() function and .tofile() method to read and write NumPy arrays directly
(mind your byteorder though!) If a good C or C++ library exists that read the data, one can wrap that library with a variety
of techniques though that certainly is much more work and requires significantly more advanced knowledge to interface
with C or C++.

4.1.7 6) Use of special library functions (e.g., SciPy, Pandas, and OpenCV)

NumPy is the fundamental library for array containers in the Python Scientific Computing stack. Many Python libraries,
including SciPy, Pandas, and OpenCV, use NumPy ndarrays as the common format for data exchange, These libraries
can create, operate on, and work with NumPy arrays.

4.2 Indexing on ndarrays

See also:
Indexing routines
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There
are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
Most of the following examples show the use of indexing when referencing data in an array. The examples work just as
well when assigning to an array. See Assigning values to indexed arrays for specific examples and explanations on how
assignments work.
Note that in Python, x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN]; the
latter is just syntactic sugar for the former.

4.2.1 Basic indexing

Single element indexing

Single element indexing works exactly like that for other standard Python sequences. It is 0-based, and accepts negative
indices for indexing from the end of the array.

>>> x = [Link](10)
>>> x[2]
2
>>> x[-2]
8

It is not necessary to separate each dimension’s index into its own set of square brackets.

>>> [Link] = (2, 5) # now x is 2-dimensional


>>> x[1, 3]
8
>>> x[1, -1]
9

Note that if one indexes a multidimensional array with fewer indices than dimensions, one gets a subdimensional array.
For example:

4.2. Indexing on ndarrays 69


NumPy User Guide, Release 1.22.0

>>> x[0]
array([0, 1, 2, 3, 4])

That is, each index specified selects the array corresponding to the rest of the dimensions selected. In the above example,
choosing 0 means that the remaining dimension of length 5 is being left unspecified, and that what is returned is an array
of that dimensionality and size. It must be noted that the returned array is a view, i.e., it is not a copy of the original,
but points to the same values in memory as does the original array. In this case, the 1-D array at the first position (0) is
returned. So using a single index on the returned array, results in a single element being returned. That is:

>>> x[0][2]
2

So note that x[0, 2] == x[0][2] though the second case is more inefficient as a new temporary array is created
after the first index that is subsequently indexed by 2.

Note: NumPy uses C-order indexing. That means that the last index usually represents the most rapidly changing
memory location, unlike Fortran or IDL, where the first index represents the most rapidly changing location in memory.
This difference represents a great potential for confusion.

Slicing and striding

Basic slicing extends Python’s basic concept of slicing to N dimensions. Basic slicing occurs when obj is a slice object
(constructed by start:stop:step notation inside of brackets), an integer, or a tuple of slice objects and integers.
Ellipsis and newaxis objects can be interspersed with these as well.
Deprecated since version 1.15.0: In order to remain backward compatible with a common usage in Numeric, basic slicing
is also initiated if the selection object is any non-ndarray and non-tuple sequence (such as a list) containing slice
objects, the Ellipsis object, or the newaxis object, but not for integer arrays or other embedded sequences.
The simplest case of indexing with N integers returns an array scalar representing the corresponding item. As in Python,
all indices are zero-based: for the i-th index ni , the valid range is 0 ≤ ni < di where di is the i-th element of the shape
of the array. Negative indices are interpreted as counting from the end of the array (i.e., if ni < 0, it means ni + di ).
All arrays generated by basic slicing are always views of the original array.

Note: NumPy slicing creates a view instead of a copy as in the case of built-in Python sequences such as string, tuple and
list. Care must be taken when extracting a small portion from a large array which becomes useless after the extraction,
because the small portion extracted contains a reference to the large original array whose memory will not be released
until all arrays derived from it are garbage-collected. In such cases an explicit copy() is recommended.

The standard rules of sequence slicing apply to basic slicing on a per-dimension basis (including using a step index). Some
useful concepts to remember include:
• The basic slice syntax is i:j:k where i is the starting index, j is the stopping index, and k is the step (k ̸= 0).
This selects the m elements (in the corresponding dimension) with index values i, i + k, …, i + (m - 1) k where
m = q + (r ̸= 0) and q and r are the quotient and remainder obtained by dividing j - i by k: j - i = q k + r, so that
i + (m - 1) k < j. For example:

>>> x = [Link]([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


>>> x[Link]
array([1, 3, 5])

70 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

• Negative i and j are interpreted as n + i and n + j where n is the number of elements in the corresponding dimension.
Negative k makes stepping go towards smaller indices. From the above example:

>>> x[-2:10]
array([8, 9])
>>> x[-[Link]-1]
array([7, 6, 5, 4])

• Assume n is the number of elements in the dimension being sliced. Then, if i is not given it defaults to 0 for k > 0
and n - 1 for k < 0 . If j is not given it defaults to n for k > 0 and -n-1 for k < 0 . If k is not given it defaults to 1.
Note that :: is the same as : and means select all indices along this axis. From the above example:

>>> x[5:]
array([5, 6, 7, 8, 9])

• If the number of objects in the selection tuple is less than N, then : is assumed for any subsequent dimensions. For
example:

>>> x = [Link]([[[1],[2],[3]], [[4],[5],[6]]])


>>> [Link]
(2, 3, 1)
>>> x[1:2]
array([[[4],
[5],
[6]]])

• An integer, i, returns the same values as i:i+1 except the dimensionality of the returned object is reduced by 1.
In particular, a selection tuple with the p-th element an integer (and all other entries :) returns the corresponding
sub-array with dimension N - 1. If N = 1 then the returned object is an array scalar. These objects are explained in
[Link].
• If the selection tuple has all entries : except the p-th entry which is a slice object i:j:k, then the returned array
has dimension N formed by concatenating the sub-arrays returned by integer indexing of elements i, i+k, …, i +
(m - 1) k < j,
• Basic slicing with more than one non-: entry in the slicing tuple, acts like repeated application of slicing using a
single non-: entry, where the non-: entries are successively taken (with all other non-: entries replaced by :).
Thus, x[ind1, ..., ind2,:] acts like x[ind1][..., ind2, :] under basic slicing.

Warning: The above is not true for advanced indexing.

• You may use slicing to set values in the array, but (unlike lists) you can never grow the array. The size of the value
to be set in x[obj] = value must be (broadcastable) to the same shape as x[obj].
• A slicing tuple can always be constructed as obj and used in the x[obj] notation. Slice objects can be used in
the construction in place of the [start:stop:step] notation. For example, x[Link], ::-1] can also
be implemented as obj = (slice(1, 10, 5), slice(None, None, -1)); x[obj] . This can
be useful for constructing generic code that works on arrays of arbitrary dimensions. See Dealing with variable
numbers of indices within programs for more information.

4.2. Indexing on ndarrays 71


NumPy User Guide, Release 1.22.0

Dimensional indexing tools

There are some tools to facilitate the easy matching of array shapes with expressions and in assignments.
Ellipsis expands to the number of : objects needed for the selection tuple to index all dimensions. In most cases,
this means that the length of the expanded selection tuple is [Link]. There may only be a single ellipsis present. From
the above example:

>>> x[..., 0]
array([[1, 2, 3],
[4, 5, 6]])

This is equivalent to:

>>> x[:, :, 0]
array([[1, 2, 3],
[4, 5, 6]])

Each newaxis object in the selection tuple serves to expand the dimensions of the resulting selection by one unit-length
dimension. The added dimension is the position of the newaxis object in the selection tuple. newaxis is an alias for
None, and None can be used in place of this with the same result. From the above example:

>>> x[:, [Link], :, :].shape


(2, 1, 3, 1)
>>> x[:, None, :, :].shape
(2, 1, 3, 1)

This can be handy to combine two arrays in a way that otherwise would require explicit reshaping operations. For example:

>>> x = [Link](5)
>>> x[:, [Link]] + x[[Link], :]
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])

4.2.2 Advanced indexing

Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type
integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types
of advanced indexing: integer and Boolean.
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

Warning: The definition of advanced indexing means that x[(1, 2, 3),] is fundamentally different than
x[(1, 2, 3)]. The latter is equivalent to x[1, 2, 3] which will trigger basic selection while the former will
trigger advanced indexing. Be sure to understand why this occurs.
Also recognize that x[[1, 2, 3]] will trigger advanced indexing, whereas due to the deprecated Numeric com-
patibility mentioned above, x[[1, 2, slice(None)]] will trigger basic slicing.

72 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

Integer array indexing

Integer array indexing allows selection of arbitrary items in the array based on their N-dimensional index. Each integer
array represents a number of indices into that dimension.
Negative values are permitted in the index arrays and work as they do with single indices or slices:

>>> x = [Link](10, 1, -1)


>>> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[[Link]([3, 3, 1, 8])]
array([7, 7, 9, 2])
>>> x[[Link]([3, 3, -3, 8])]
array([7, 7, 4, 2])

If the index values are out of bounds then an IndexError is thrown:

>>> x = [Link]([[1, 2], [3, 4], [5, 6]])


>>> x[[Link]([1, -1])]
array([[3, 4],
[5, 6]])
>>> x[[Link]([3, 4])]
IndexError: index 3 is out of bounds for axis 0 with size 3

When the index consists of as many integer arrays as dimensions of the array being indexed, the indexing is straightforward,
but different from slicing.
Advanced indices always are broadcast and iterated as one:

result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],


..., ind_N[i_1, ..., i_M]]

Note that the resulting shape is identical to the (broadcast) indexing array shapes ind_1, ..., ind_N. If the indices
cannot be broadcast to the same shape, an exception IndexError: shape mismatch: indexing arrays
could not be broadcast together with shapes... is raised.
Indexing with multidimensional index arrays tend to be more unusual uses, but they are permitted, and they are useful for
some problems. We’ll start with the simplest multidimensional case:

>>> y = [Link](35).reshape(5, 7)
>>> y
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
>>> y[[Link]([0, 2, 4]), [Link]([0, 1, 2])]
array([ 0, 15, 30])

In this case, if the index arrays have a matching shape, and there is an index array for each dimension of the array being
indexed, the resultant array has the same shape as the index arrays, and the values correspond to the index set for each
position in the index arrays. In this example, the first index value is 0 for both index arrays, and thus the first value of the
resultant array is y[0, 0]. The next value is y[2, 1], and the last is y[4, 2].
If the index arrays do not have the same shape, there is an attempt to broadcast them to the same shape. If they cannot
be broadcast to the same shape, an exception is raised:

4.2. Indexing on ndarrays 73


NumPy User Guide, Release 1.22.0

>>> y[[Link]([0, 2, 4]), [Link]([0, 1])]


IndexError: shape mismatch: indexing arrays could not be broadcast
together with shapes (3,) (2,)

The broadcasting mechanism permits index arrays to be combined with scalars for other indices. The effect is that the
scalar value is used for all the corresponding values of the index arrays:
>>> y[[Link]([0, 2, 4]), 1]
array([ 1, 15, 29])

Jumping to the next level of complexity, it is possible to only partially index an array with index arrays. It takes a bit of
thought to understand what happens in such cases. For example if we just use one index array with y:
>>> y[[Link]([0, 2, 4])]
array([[ 0, 1, 2, 3, 4, 5, 6],
[14, 15, 16, 17, 18, 19, 20],
[28, 29, 30, 31, 32, 33, 34]])

It results in the construction of a new array where each value of the index array selects one row from the array being
indexed and the resultant array has the resulting shape (number of index elements, size of row).
In general, the shape of the resultant array will be the concatenation of the shape of the index array (or the shape that
all the index arrays were broadcast to) with the shape of any unused dimensions (those not indexed) in the array being
indexed.

Example

From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies
the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using
advanced indexing:
>>> x = [Link]([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])

To achieve a behaviour similar to the basic slicing above, broadcasting can be used. The function ix_ can help with this
broadcasting. This is best understood with an example.

Example

From a 4x3 array the corner elements should be selected using advanced indexing. Thus all elements for which the column
is one of [0, 2] and the row is one of [0, 3] need to be selected. To use advanced indexing one needs to select all
elements explicitly. Using the method explained previously one could write:
>>> x = [Link]([[ 0, 1, 2],
... [ 3, 4, 5],
... [ 6, 7, 8],
... [ 9, 10, 11]])
>>> rows = [Link]([[0, 0],
... [3, 3]], dtype=[Link])
>>> columns = [Link]([[0, 2],
... [0, 2]], dtype=[Link])
>>> x[rows, columns]
array([[ 0, 2],
[ 9, 11]])

74 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

However, since the indexing arrays above just repeat themselves, broadcasting can be used (compare operations such as
rows[:, [Link]] + columns) to simplify this:
>>> rows = [Link]([0, 3], dtype=[Link])
>>> columns = [Link]([0, 2], dtype=[Link])
>>> rows[:, [Link]]
array([[0],
[3]])
>>> x[rows[:, [Link]], columns]
array([[ 0, 2],
[ 9, 11]])

This broadcasting can also be achieved using the function ix_:


>>> x[np.ix_(rows, columns)]
array([[ 0, 2],
[ 9, 11]])

Note that without the np.ix_ call, only the diagonal elements would be selected:
>>> x[rows, columns]
array([ 0, 11])

This difference is the most important thing to remember about indexing with multiple advanced indices.

Example

A real-life example of where advanced indexing may be useful is for a color lookup table where we want to map the values
of an image into RGB triples for display. The lookup table could have a shape (nlookup, 3). Indexing such an array with
an image with shape (ny, nx) with dtype=np.uint8 (or any integer type so long as values are with the bounds of the lookup
table) will result in an array of shape (ny, nx, 3) where a triple of RGB values is associated with each pixel location.

Boolean array indexing

This advanced indexing occurs when obj is an array object of Boolean type, such as may be returned from comparison
operators. A single boolean index array is practically identical to x[[Link]()] where, as described above,
[Link]() returns a tuple (of length [Link]) of integer index arrays showing the True elements of obj.
However, it is faster when [Link] == [Link].
If [Link] == [Link], x[obj] returns a 1-dimensional array filled with the elements of x corresponding to the
True values of obj. The search order will be row-major, C-style. If obj has True values at entries that are outside of
the bounds of x, then an index error will be raised. If obj is smaller than x it is identical to filling it with False.
A common use case for this is filtering for desired element values. For example, one may wish to select all entries from
an array which are not NaN:
>>> x = [Link]([[1., 2.], [[Link], 3.], [[Link], [Link]]])
>>> x[~[Link](x)]
array([1., 2., 3.])

Or wish to add a constant to all negative elements:


>>> x = [Link]([1., -1., -2., 3])
>>> x[x < 0] += 20
>>> x
array([1., 19., 18., 3.])

4.2. Indexing on ndarrays 75


NumPy User Guide, Release 1.22.0

In general if an index includes a Boolean array, the result will be identical to inserting [Link]() into the same
position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2]
is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].
If there is only one Boolean array and no integer indexing array present, this is straightforward. Care must only be taken
to make sure that the boolean index has exactly as many dimensions as it is supposed to work with.
In general, when the boolean array has fewer dimensions than the array being indexed, this is equivalent to x[b, ...],
which means x is indexed by b followed by as many : as are needed to fill out the rank of x. Thus the shape of the result
is one dimension containing the number of True elements of the boolean array, followed by the remaining dimensions of
the array being indexed:

>>> x = [Link](35).reshape(5, 7)
>>> b = x > 20
>>> b[:, 5]
array([False, False, False, True, True])
>>> x[b[:, 5]]
array([[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])

Here the 4th and 5th rows are selected from the indexed array and combined to make a 2-D array.

Example

From an array, select all rows which sum up to less or equal two:

>>> x = [Link]([[0, 1], [1, 1], [2, 2]])


>>> rowsum = [Link](-1)
>>> x[rowsum <= 2, :]
array([[0, 1],
[1, 1]])

Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the
[Link]() analogy. The function ix_ also supports boolean arrays and will work without any surprises.

Example

Use boolean indexing to select all rows adding up to an even number. At the same time columns 0 and 2 should be selected
with an advanced integer index. Using the ix_ function this can be done with:

>>> x = [Link]([[ 0, 1, 2],


... [ 3, 4, 5],
... [ 6, 7, 8],
... [ 9, 10, 11]])
>>> rows = ([Link](-1) % 2) == 0
>>> rows
array([False, True, False, True])
>>> columns = [0, 2]
>>> x[np.ix_(rows, columns)]
array([[ 3, 5],
[ 9, 11]])

Without the np.ix_ call, only the diagonal elements would be selected.
Or without np.ix_ (compare the integer array examples):

76 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

>>> rows = [Link]()[0]


>>> x[rows[:, [Link]], columns]
array([[ 3, 5],
[ 9, 11]])

Example

Use a 2-D boolean array of shape (2, 3) with four True elements to select rows from a 3-D array of shape (2, 3, 5) results
in a 2-D result of shape (4, 5):
>>> x = [Link](30).reshape(2, 3, 5)
>>> x
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
>>> b = [Link]([[True, True, False], [False, True, True]])
>>> x[b]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])

Combining advanced and basic indexing

When there is at least one slice (:), ellipsis (...) or newaxis in the index (or the array has more dimensions than there
are advanced indices), then the behaviour can be more complicated. It is like concatenating the indexing result for each
advanced index element.
In the simplest case, there is only a single advanced index combined with a slice. For example:
>>> y = [Link](35).reshape(5,7)
>>> y[[Link]([0, 2, 4]), 1:3]
array([[ 1, 2],
[15, 16],
[29, 30]])

In effect, the slice and index array operation are independent. The slice operation extracts columns with index 1 and 2,
(i.e. the 2nd and 3rd columns), followed by the index array operation which extracts rows with index 0, 2 and 4 (i.e the
first, third and fifth rows). This is equivalent to:
>>> y[:, 1:3][[Link]([0, 2, 4]), :]
array([[ 1, 2],
[15, 16],
[29, 30]])

A single advanced index can, for example, replace a slice and the result array will be the same. However, it is a copy and
may have a different memory layout. A slice is preferable when it is possible. For example:
>>> x = [Link]([[ 0, 1, 2],
... [ 3, 4, 5],
... [ 6, 7, 8],
(continues on next page)

4.2. Indexing on ndarrays 77


NumPy User Guide, Release 1.22.0

(continued from previous page)


... [ 9, 10, 11]])
>>> x[1:2, 1:3]
array([[4, 5]])
>>> x[1:2, [1, 2]]
array([[4, 5]])

The easiest way to understand a combination of multiple advanced indices may be to think in terms of the resulting shape.
There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the
subspace from the advanced indexing part. Two cases of index combination need to be distinguished:
• The advanced indices are separated by a slice, Ellipsis or newaxis. For example x[arr1, :, arr2].
• The advanced indices are all next to each other. For example x[..., arr1, arr2, :] but not x[arr1,
:, 1] since 1 is an advanced index in this regard.
In the first case, the dimensions resulting from the advanced indexing operation come first in the result array, and the
subspace dimensions after that. In the second case, the dimensions from the advanced indexing operations are inserted
into the result array at the same spot as they were in the initial array (the latter logic is what makes simple advanced
indexing behave just like slicing).

Example

Suppose [Link] is (10, 20, 30) and ind is a (2, 3, 4)-shaped indexing intp array, then result = x[..., ind,
:] has shape (10, 2, 3, 4, 30) because the (20,)-shaped subspace has been replaced with a (2, 3, 4)-shaped broadcasted
indexing subspace. If we let i, j, k loop over the (2, 3, 4)-shaped subspace then result[..., i, j, k, :] =
x[..., ind[i, j, k], :]. This example produces the same result as [Link](ind, axis=-2).

Example

Let [Link] be (10, 20, 30, 40, 50) and suppose ind_1 and ind_2 can be broadcast to the shape (2, 3, 4). Then
x[:, ind_1, ind_2] has shape (10, 2, 3, 4, 40, 50) because the (20, 30)-shaped subspace from X has been replaced
with the (2, 3, 4) subspace from the indices. However, x[:, ind_1, :, ind_2] has shape (2, 3, 4, 10, 30, 50)
because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always
possible to use .transpose() to move the subspace anywhere desired. Note that this example cannot be replicated
using take.

Example

Slicing can be combined with broadcasted boolean indices:

>>> x = [Link](35).reshape(5, 7)
>>> b = x > 20
>>> b
array([[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[ True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True]])
>>> x[b[:, 5], 1:3]
array([[22, 23],
[29, 30]])

78 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

4.2.3 Field access

See also:
Structured arrays
If the ndarray object is a structured array the fields of the array can be accessed by indexing the array with strings,
dictionary-like.
Indexing x['field-name'] returns a new view to the array, which is of the same shape as x (except when the field is
a sub-array) but of data type [Link]['field-name'] and contains only the part of the data in the specified field.
Also, record array scalars can be “indexed” this way.
Indexing into a structured array can also be done with a list of field names, e.g. x[['field-name1',
'field-name2']]. As of NumPy 1.16, this returns a view containing only those fields. In older versions of NumPy,
it returned a copy. See the user guide section on Structured arrays for more information on multifield indexing.
If the accessed field is a sub-array, the dimensions of the sub-array are appended to the shape of the result. For example:

>>> x = [Link]((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3))])


>>> x['a'].shape
(2, 2)
>>> x['a'].dtype
dtype('int32')
>>> x['b'].shape
(2, 2, 3, 3)
>>> x['b'].dtype
dtype('float64')

4.2.4 Flat Iterator indexing

[Link] returns an iterator that will iterate over the entire array (in C-contiguous style with the last index varying the
fastest). This iterator object can also be indexed using basic slicing or advanced indexing as long as the selection object is
not a tuple. This should be clear from the fact that [Link] is a 1-dimensional view. It can be used for integer indexing
with 1-dimensional C-style-flat indices. The shape of any returned array is therefore the shape of the integer indexing
object.

4.2.5 Assigning values to indexed arrays

As mentioned, one can select a subset of an array to assign to using a single index, slices, and index and mask arrays.
The value being assigned to the indexed array must be shape consistent (the same shape or broadcastable to the shape the
index produces). For example, it is permitted to assign a constant to a slice:

>>> x = [Link](10)
>>> x[2:7] = 1

or an array of the right size:

>>> x[2:7] = [Link](5)

Note that assignments may result in changes if assigning higher types to lower types (like floats to ints) or even exceptions
(assigning complex to floats or ints):

4.2. Indexing on ndarrays 79


NumPy User Guide, Release 1.22.0

>>> x[1] = 1.2


>>> x[1]
1
>>> x[1] = 1.2j
TypeError: can't convert complex to int

Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the
array (indeed, nothing else would make sense!). Note though, that some actions may not work as one may naively expect.
This particular example is often surprising to people:

>>> x = [Link](0, 50, 10)


>>> x
array([ 0, 10, 20, 30, 40])
>>> x[[Link]([1, 1, 3, 1])] += 1
>>> x
array([ 0, 11, 20, 31, 40])

Where people expect that the 1st location will be incremented by 3. In fact, it will only be incremented by 1. The reason
is that a new array is extracted from the original (as a temporary) containing the values at 1, 1, 3, 1, then the value 1
is added to the temporary, and then the temporary is assigned back to the original array. Thus the value of the array at
x[1] + 1 is assigned to x[1] three times, rather than being incremented 3 times.

4.2.6 Dealing with variable numbers of indices within programs

The indexing syntax is very powerful but limiting when dealing with a variable number of indices. For example, if you
want to write a function that can handle arguments with various numbers of dimensions without having to write special
case code for each number of possible dimensions, how can that be done? If one supplies to the index a tuple, the tuple
will be interpreted as a list of indices. For example:

>>> z = [Link](81).reshape(3, 3, 3, 3)
>>> indices = (1, 1, 1, 1)
>>> z[indices]
40

So one can use code to construct tuples of any number of indices and then use these within an index.
Slices can be specified within programs by using the slice() function in Python. For example:

>>> indices = (1, 1, 1, slice(0, 2)) # same as [1, 1, 1, 0:2]


>>> z[indices]
array([39, 40])

Likewise, ellipsis can be specified by code by using the Ellipsis object:

>>> indices = (1, Ellipsis, 1) # same as [1, ..., 1]


>>> z[indices]
array([[28, 31, 34],
[37, 40, 43],
[46, 49, 52]])

For this reason, it is possible to use the output from the [Link]() function directly as an index since it always
returns a tuple of index arrays.
Because the special treatment of tuples, they are not automatically converted to an array as a list would be. As an example:

80 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

>>> z[[1, 1, 1, 1]] # produces a large array


array([[[[27, 28, 29],
[30, 31, 32], ...
>>> z[(1, 1, 1, 1)] # returns a single value
40

4.2.7 Detailed notes

These are some detailed notes, which are not of importance for day to day indexing (in no particular order):
• The native NumPy indexing type is intp and may differ from the default integer array type. intp is the smallest
data type sufficient to safely index any array; for advanced indexing it may be faster than other types.
• For advanced assignments, there is in general no guarantee for the iteration order. This means that if an element is
set more than once, it is not possible to predict the final result.
• An empty (tuple) index is a full scalar index into a zero-dimensional array. x[()] returns a scalar if x is zero-
dimensional and a view otherwise. On the other hand, x[...] always returns a view.
• If a zero-dimensional array is present in the index and it is a full integer index the result will be a scalar and not a
zero-dimensional array. (Advanced indexing is not triggered.)
• When an ellipsis (...) is present but has no size (i.e. replaces zero :) the result will still always be an array. A
view if no advanced index is present, otherwise a copy.
• The nonzero equivalence for Boolean arrays does not hold for zero dimensional boolean arrays.
• When the result of an advanced indexing operation has no elements but an individual index is out of bounds, whether
or not an IndexError is raised is undefined (e.g. x[[], [123]] with 123 being out of bounds).
• When a casting error occurs during assignment (for example updating a numerical array using a sequence of strings),
the array being assigned to may end up in an unpredictable partially updated state. However, if any other error (such
as an out of bounds index) occurs, the array will remain unchanged.
• The memory layout of an advanced indexing result is optimized for each indexing operation and no particular
memory order can be assumed.
• When using a subclass (especially one which manipulates its shape), the default ndarray.__setitem__ be-
haviour will call __getitem__ for basic indexing but not for advanced indexing. For such a subclass it may be
preferable to call ndarray.__setitem__ with a base class ndarray view on the data. This must be done if the
subclasses __getitem__ does not return views.

4.3 I/O with NumPy

4.3.1 Importing data with genfromtxt

NumPy provides several functions to create arrays from tabular data. We focus here on the genfromtxt function.
In a nutshell, genfromtxt runs two main loops. The first loop converts each line of the file in a sequence of strings.
The second loop converts each string to the appropriate data type. This mechanism is slower than a single loop, but gives
more flexibility. In particular, genfromtxt is able to take missing data into account, when other faster and simpler
functions like loadtxt cannot.

Note: When giving examples, we will use the following conventions:

4.3. I/O with NumPy 81


NumPy User Guide, Release 1.22.0

>>> import numpy as np


>>> from io import StringIO

Defining the input

The only mandatory argument of genfromtxt is the source of the data. It can be a string, a list of strings, a generator or
an open file-like object with a read method, for example, a file or [Link] object. If a single string is provided,
it is assumed to be the name of a local or remote file. If a list of strings or a generator returning strings is provided, each
string is treated as one line in a file. When the URL of a remote file is passed, the file is automatically downloaded to the
current directory and opened.
Recognized file types are text files and archives. Currently, the function recognizes gzip and bz2 (bzip2) archives.
The type of the archive is determined from the extension of the file: if the filename ends with '.gz', a gzip archive is
expected; if it ends with 'bz2', a bzip2 archive is assumed.

Splitting the lines into columns

The delimiter argument


Once the file is defined and open for reading, genfromtxt splits each non-empty line into a sequence of strings. Empty
or commented lines are just skipped. The delimiter keyword is used to define how the splitting should take place.
Quite often, a single character marks the separation between columns. For example, comma-separated files (CSV) use a
comma (,) or a semicolon (;) as delimiter:

>>> data = u"1, 2, 3\n4, 5, 6"


>>> [Link](StringIO(data), delimiter=",")
array([[ 1., 2., 3.],
[ 4., 5., 6.]])

Another common separator is "\t", the tabulation character. However, we are not limited to a single character, any
string will do. By default, genfromtxt assumes delimiter=None, meaning that the line is split along white spaces
(including tabs) and that consecutive white spaces are considered as a single white space.
Alternatively, we may be dealing with a fixed-width file, where columns are defined as a given number of characters. In
that case, we need to set delimiter to a single integer (if all the columns have the same size) or to a sequence of
integers (if columns can have different sizes):

>>> data = u" 1 2 3\n 4 5 67\n890123 4"


>>> [Link](StringIO(data), delimiter=3)
array([[ 1., 2., 3.],
[ 4., 5., 67.],
[ 890., 123., 4.]])
>>> data = u"123456789\n 4 7 9\n 4567 9"
>>> [Link](StringIO(data), delimiter=(4, 3, 2))
array([[ 1234., 567., 89.],
[ 4., 7., 9.],
[ 4., 567., 9.]])

82 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

The autostrip argument


By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading nor trailing
white spaces. This behavior can be overwritten by setting the optional argument autostrip to a value of True:

>>> data = u"1, abc , 2\n 3, xxx, 4"


>>> # Without autostrip
>>> [Link](StringIO(data), delimiter=",", dtype="|U5")
array([['1', ' abc ', ' 2'],
['3', ' xxx', ' 4']], dtype='<U5')
>>> # With autostrip
>>> [Link](StringIO(data), delimiter=",", dtype="|U5", autostrip=True)
array([['1', 'abc', '2'],
['3', 'xxx', '4']], dtype='<U5')

The comments argument


The optional argument comments is used to define a character string that marks the beginning of a comment. By default,
genfromtxt assumes comments='#'. The comment marker may occur anywhere on the line. Any character present
after the comment marker(s) is simply ignored:

>>> data = u"""#


... # Skip me !
... # Skip me too !
... 1, 2
... 3, 4
... 5, 6 #This is the third line of the data
... 7, 8
... # And here comes the last line
... 9, 0
... """
>>> [Link](StringIO(data), comments="#", delimiter=",")
array([[1., 2.],
[3., 4.],
[5., 6.],
[7., 8.],
[9., 0.]])

New in version 1.7.0: When comments is set to None, no lines are treated as comments.

Note: There is one notable exception to this behavior: if the optional argument names=True, the first commented
line will be examined for names.

Skipping lines and choosing columns

The skip_header and skip_footer arguments


The presence of a header in the file can hinder data processing. In that case, we need to use the skip_header optional
argument. The values of this argument must be an integer which corresponds to the number of lines to skip at the
beginning of the file, before any other action is performed. Similarly, we can skip the last n lines of the file by using the
skip_footer attribute and giving it a value of n:

>>> data = u"\n".join(str(i) for i in range(10))


>>> [Link](StringIO(data),)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
(continues on next page)

4.3. I/O with NumPy 83


NumPy User Guide, Release 1.22.0

(continued from previous page)


>>> [Link](StringIO(data),
... skip_header=3, skip_footer=5)
array([ 3., 4.])

By default, skip_header=0 and skip_footer=0, meaning that no lines are skipped.

The usecols argument


In some cases, we are not interested in all the columns of the data but only a few of them. We can select which columns to
import with the usecols argument. This argument accepts a single integer or a sequence of integers corresponding to
the indices of the columns to import. Remember that by convention, the first column has an index of 0. Negative integers
behave the same as regular Python negative indexes.
For example, if we want to import only the first and the last columns, we can use usecols=(0, -1):

>>> data = u"1 2 3\n4 5 6"


>>> [Link](StringIO(data), usecols=(0, -1))
array([[ 1., 3.],
[ 4., 6.]])

If the columns have names, we can also select which columns to import by giving their name to the usecols argument,
either as a sequence of strings or a comma-separated string:

>>> data = u"1 2 3\n4 5 6"


>>> [Link](StringIO(data),
... names="a, b, c", usecols=("a", "c"))
array([(1.0, 3.0), (4.0, 6.0)],
dtype=[('a', '<f8'), ('c', '<f8')])
>>> [Link](StringIO(data),
... names="a, b, c", usecols=("a, c"))
array([(1.0, 3.0), (4.0, 6.0)],
dtype=[('a', '<f8'), ('c', '<f8')])

Choosing the data type

The main way to control how the sequences of strings we have read from the file are converted to other types is to set the
dtype argument. Acceptable values for this argument are:
• a single type, such as dtype=float. The output will be 2D with the given dtype, unless a name has been
associated with each column with the use of the names argument (see below). Note that dtype=float is the
default for genfromtxt.
• a sequence of types, such as dtype=(int, float, float).
• a comma-separated string, such as dtype="i4,f8,|U3".
• a dictionary with two keys 'names' and 'formats'.
• a sequence of tuples (name, type), such as dtype=[('A', int), ('B', float)].
• an existing [Link] object.
• the special value None. In that case, the type of the columns will be determined from the data itself (see below).
In all the cases but the first one, the output will be a 1D array with a structured dtype. This dtype has as many fields as
items in the sequence. The field names are defined with the names keyword.
When dtype=None, the type of each column is determined iteratively from its data. We start by checking whether a
string can be converted to a boolean (that is, if the string matches true or false in lower cases); then whether it can

84 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

be converted to an integer, then to a float, then to a complex and eventually to a string. This behavior may be changed by
modifying the default mapper of the StringConverter class.
The option dtype=None is provided for convenience. However, it is significantly slower than setting the dtype explicitly.

Setting the names

The names argument


A natural approach when dealing with tabular data is to allocate a name to each column. A first possibility is to use an
explicit structured dtype, as mentioned previously:

>>> data = StringIO("1 2 3\n 4 5 6")


>>> [Link](data, dtype=[(_, int) for _ in "abc"])
array([(1, 2, 3), (4, 5, 6)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

Another simpler possibility is to use the names keyword with a sequence of strings or a comma-separated string:

>>> data = StringIO("1 2 3\n 4 5 6")


>>> [Link](data, names="A, B, C")
array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

In the example above, we used the fact that by default, dtype=float. By giving a sequence of names, we are forcing
the output to a structured dtype.
We may sometimes need to define the column names from the data itself. In that case, we must use the names keyword
with a value of True. The names will then be read from the first line (after the skip_header ones), even if the line
is commented out:

>>> data = StringIO("So it goes\n#a b c\n1 2 3\n 4 5 6")


>>> [Link](data, skip_header=1, names=True)
array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)],
dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])

The default value of names is None. If we give any other value to the keyword, the new names will overwrite the field
names we may have defined with the dtype:

>>> data = StringIO("1 2 3\n 4 5 6")


>>> ndtype=[('a',int), ('b', float), ('c', int)]
>>> names = ["A", "B", "C"]
>>> [Link](data, names=names, dtype=ndtype)
array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('A', '<i8'), ('B', '<f8'), ('C', '<i8')])

The defaultfmt argument


If names=None but a structured dtype is expected, names are defined with the standard NumPy default of "f%i",
yielding names like f0, f1 and so forth:

>>> data = StringIO("1 2 3\n 4 5 6")


>>> [Link](data, dtype=(int, float, int))
array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<i8')])

In the same way, if we don’t give enough names to match the length of the dtype, the missing names will be defined with
this default template:

4.3. I/O with NumPy 85


NumPy User Guide, Release 1.22.0

>>> data = StringIO("1 2 3\n 4 5 6")


>>> [Link](data, dtype=(int, float, int), names="a")
array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('a', '<i8'), ('f0', '<f8'), ('f1', '<i8')])

We can overwrite this default with the defaultfmt argument, that takes any format string:

>>> data = StringIO("1 2 3\n 4 5 6")


>>> [Link](data, dtype=(int, float, int), defaultfmt="var_%02i")
array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('var_00', '<i8'), ('var_01', '<f8'), ('var_02', '<i8')])

Note: We need to keep in mind that defaultfmt is used only if some names are expected but not defined.

Validating names
NumPy arrays with a structured dtype can also be viewed as recarray, where a field can be accessed as if it were an
attribute. For that reason, we may need to make sure that the field name doesn’t contain any space or invalid character, or
that it does not correspond to the name of a standard attribute (like size or shape), which would confuse the interpreter.
genfromtxt accepts three optional arguments that provide a finer control on the names:
deletechars
Gives a string combining all the characters that must be deleted from the name. By default, invalid
characters are ~!@#$%^&*()-=+~\|]}[{';: /?.>,<.
excludelist
Gives a list of the names to exclude, such as return, file, print… If one of the input name is
part of this list, an underscore character ('_') will be appended to it.
case_sensitive
Whether the names should be case-sensitive (case_sensitive=True), converted to up-
per case (case_sensitive=False or case_sensitive='upper') or to lower case
(case_sensitive='lower').

Tweaking the conversion

The converters argument


Usually, defining a dtype is sufficient to define how the sequence of strings must be converted. However, some additional
control may sometimes be required. For example, we may want to make sure that a date in a format YYYY/MM/DD is
converted to a datetime object, or that a string like xx% is properly converted to a float between 0 and 1. In such cases,
we should define conversion functions with the converters arguments.
The value of this argument is typically a dictionary with column indices or column names as keys and a conversion
functions as values. These conversion functions can either be actual functions or lambda functions. In any case, they
should accept only a string as input and output only a single element of the wanted type.
In the following example, the second column is converted from as string representing a percentage to a float between 0
and 1:

>>> convertfunc = lambda x: float([Link](b"%"))/100.


>>> data = u"1, 2.3%, 45.\n6, 78.9%, 0"
>>> names = ("i", "p", "n")
(continues on next page)

86 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

(continued from previous page)


>>> # General case .....
>>> [Link](StringIO(data), delimiter=",", names=names)
array([(1., nan, 45.), (6., nan, 0.)],
dtype=[('i', '<f8'), ('p', '<f8'), ('n', '<f8')])

We need to keep in mind that by default, dtype=float. A float is therefore expected for the second column. However,
the strings ' 2.3%' and ' 78.9%' cannot be converted to float and we end up having [Link] instead. Let’s now
use a converter:

>>> # Converted case ...


>>> [Link](StringIO(data), delimiter=",", names=names,
... converters={1: convertfunc})
array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
dtype=[('i', '<f8'), ('p', '<f8'), ('n', '<f8')])

The same results can be obtained by using the name of the second column ("p") as key instead of its index (1):

>>> # Using a name for the converter ...


>>> [Link](StringIO(data), delimiter=",", names=names,
... converters={"p": convertfunc})
array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
dtype=[('i', '<f8'), ('p', '<f8'), ('n', '<f8')])

Converters can also be used to provide a default for missing entries. In the following example, the converter convert
transforms a stripped string into the corresponding float or into -999 if the string is empty. We need to explicitly strip the
string from white spaces as it is not done by default:

>>> data = u"1, , 3\n 4, 5, 6"


>>> convert = lambda x: float([Link]() or -999)
>>> [Link](StringIO(data), delimiter=",",
... converters={1: convert})
array([[ 1., -999., 3.],
[ 4., 5., 6.]])

Using missing and filling values


Some entries may be missing in the dataset we are trying to import. In a previous example, we used a converter to
transform an empty string into a float. However, user-defined converters may rapidly become cumbersome to manage.
The genfromtxt function provides two other complementary mechanisms: the missing_values argument is used
to recognize missing data and a second argument, filling_values, is used to process these missing data.

missing_values
By default, any empty string is marked as missing. We can also consider more complex strings, such as "N/A" or "???"
to represent missing or invalid data. The missing_values argument accepts three kinds of values:
a string or a comma-separated string
This string will be used as the marker for missing data for all the columns
a sequence of strings
In that case, each item is associated to a column, in order.
a dictionary
Values of the dictionary are strings or sequence of strings. The corresponding keys can be column
indices (integers) or column names (strings). In addition, the special key None can be used to define

4.3. I/O with NumPy 87


NumPy User Guide, Release 1.22.0

a default applicable to all columns.

filling_values
We know how to recognize missing data, but we still need to provide a value for these missing entries. By default, this
value is determined from the expected dtype according to this table:

Expected type Default


bool False
int -1
float [Link]
complex [Link]+0j
string '???'

We can get a finer control on the conversion of missing values with the filling_values optional argument. Like
missing_values, this argument accepts different kind of values:
a single value
This will be the default for all columns
a sequence of values
Each entry will be the default for the corresponding column
a dictionary
Each key can be a column index or a column name, and the corresponding value should be a single
object. We can use the special key None to define a default for all columns.
In the following example, we suppose that the missing values are flagged with "N/A" in the first column and by "???"
in the third column. We wish to transform these missing values to 0 if they occur in the first and second column, and to
-999 if they occur in the last column:

>>> data = u"N/A, 2, 3\n4, ,???"


>>> kwargs = dict(delimiter=",",
... dtype=int,
... names="a,b,c",
... missing_values={0:"N/A", 'b':" ", 2:"???"},
... filling_values={0:0, 'b':0, 2:-999})
>>> [Link](StringIO(data), **kwargs)
array([(0, 2, 3), (4, 0, -999)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

usemask
We may also want to keep track of the occurrence of missing data by constructing a boolean mask, with True entries
where data was missing and False otherwise. To do that, we just have to set the optional argument usemask to True
(the default is False). The output array will then be a MaskedArray.

88 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

Shortcut functions

In addition to genfromtxt, the [Link] module provides several convenience functions derived from
genfromtxt. These functions work the same way as the original, but they have different default values.
recfromtxt
Returns a standard [Link] (if usemask=False) or a MaskedRecords array (if
usemaske=True). The default dtype is dtype=None, meaning that the types of each column will be
automatically determined.
recfromcsv
Like recfromtxt, but with a default delimiter=",".

4.4 Data types

See also:
Data type objects

4.4.1 Array types and conversions between types

NumPy supports a much greater variety of numerical types than Python does. This section shows which are available,
and how to modify an array’s data-type.
The primitive types supported are tied closely to those in C:

4.4. Data types 89


NumPy User Guide, Release 1.22.0

Numpy type C type Description


numpy.bool_ bool Boolean (True or False) stored as a byte
[Link] signed char Platform-defined
[Link] unsigned Platform-defined
char
[Link] short Platform-defined
[Link] unsigned Platform-defined
short
[Link] int Platform-defined
[Link] unsigned int Platform-defined
numpy.int_ long Platform-defined
[Link] unsigned Platform-defined
long
[Link] long long Platform-defined
[Link] unsigned Platform-defined
long long
[Link] / Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
numpy.float16
[Link] float Platform-defined single precision float: typically sign bit, 8 bits
exponent, 23 bits mantissa
[Link] double Platform-defined double precision float: typically sign bit, 11 bits
exponent, 52 bits mantissa.
[Link] long double Platform-defined extended-precision float
[Link] float Complex number, represented by two single-precision floats (real
complex and imaginary components)
[Link] double Complex number, represented by two double-precision floats (real
complex and imaginary components).
numpy. long double Complex number, represented by two extended-precision floats
clongdouble complex (real and imaginary components).

Since many of these have platform-dependent definitions, a set of fixed-size aliases are provided (See sized-aliases).
NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. Once you have
imported NumPy using

>>> import numpy as np

the dtypes are available as np.bool_, np.float32, etc.


Advanced types, not listed above, are explored in section Structured arrays.
There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point
(float) and complex. Those with numbers in their name indicate the bitsize of the type (i.e. how many bits are needed
to represent a single value in memory). Some types, such as int and intp, have differing bitsizes, dependent on the
platforms (e.g. 32-bit vs. 64-bit machines). This should be taken into account when interfacing with low-level code (such
as C or Fortran) where the raw memory is addressed.
Data-types can be used as functions to convert python numbers to array scalars (see the array scalar section for an ex-
planation), python sequences of numbers to arrays of that type, or as arguments to the dtype keyword that many numpy
functions or methods accept. Some examples:

>>> import numpy as np


>>> x = np.float32(1.0)
>>> x
(continues on next page)

90 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

(continued from previous page)


1.0
>>> y = np.int_([1,2,4])
>>> y
array([1, 2, 4])
>>> z = [Link](3, dtype=np.uint8)
>>> z
array([0, 1, 2], dtype=uint8)

Array types can also be referred to by character codes, mostly to retain backward compatibility with older packages such
as Numeric. Some documentation may still refer to these, for example:

>>> [Link]([1, 2, 3], dtype='f')


array([ 1., 2., 3.], dtype=float32)

We recommend using dtype objects instead.


To convert the type of an array, use the .astype() method (preferred) or the type itself as a function. For example:

>>> [Link](float)
array([ 0., 1., 2.])
>>> np.int8(z)
array([0, 1, 2], dtype=int8)

Note that, above, we use the Python float object as a dtype. NumPy knows that int refers to np.int_, bool means
np.bool_, that float is np.float_ and complex is np.complex_. The other data-types do not have Python
equivalents.
To determine the type of an array, look at the dtype attribute:

>>> [Link]
dtype('uint8')

dtype objects also contain information about the type, such as its bit-width and its byte-order. The data type can also be
used indirectly to query properties of the type, such as whether it is an integer:

>>> d = [Link](int)
>>> d
dtype('int32')

>>> [Link](d, [Link])


True

>>> [Link](d, [Link])


False

4.4.2 Array Scalars

NumPy generally returns elements of arrays as array scalars (a scalar with an associated dtype). Array scalars differ from
Python scalars, but for the most part they can be used interchangeably (the primary exception is for versions of Python
older than v2.x, where integer array scalars cannot act as indices for lists and tuples). There are some exceptions, such as
when code requires very specific attributes of a scalar or when it checks specifically whether a value is a Python scalar.
Generally, problems are easily fixed by explicitly converting array scalars to Python scalars, using the corresponding
Python type function (e.g., int, float, complex, str, unicode).

4.4. Data types 91


NumPy User Guide, Release 1.22.0

The primary advantage of using array scalars is that they preserve the array type (Python may not have a matching scalar
type available, e.g. int16). Therefore, the use of array scalars ensures identical behaviour between arrays and scalars,
irrespective of whether the value is inside an array or not. NumPy scalars also have many of the same methods arrays do.

4.4.3 Overflow Errors

The fixed size of NumPy numeric types may cause overflow errors when a value requires more memory than available in
the data type. For example, [Link] evaluates 100 ** 8 correctly for 64-bit integers, but gives 1874919424
(incorrect) for a 32-bit integer.

>>> [Link](100, 8, dtype=np.int64)


10000000000000000
>>> [Link](100, 8, dtype=np.int32)
1874919424

The behaviour of NumPy and Python integer types differs significantly for integer overflows and may confuse users
expecting NumPy integers to behave similar to Python’s int. Unlike NumPy, the size of Python’s int is flexible. This
means Python integers may expand to accommodate any integer and will not overflow.
NumPy provides [Link] and [Link] to verify the minimum or maximum values of NumPy integer
and floating point values respectively

>>> [Link](int) # Bounds of the default integer on this system.


iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)
>>> [Link](np.int32) # Bounds of a 32-bit integer
iinfo(min=-2147483648, max=2147483647, dtype=int32)
>>> [Link](np.int64) # Bounds of a 64-bit integer
iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

If 64-bit integers are still too small the result may be cast to a floating point number. Floating point numbers offer a larger,
but inexact, range of possible values.

>>> [Link](100, 100, dtype=np.int64) # Incorrect even with 64-bit int


0
>>> [Link](100, 100, dtype=np.float64)
1e+200

4.4.4 Extended Precision

Python’s floating-point numbers are usually 64-bit floating-point numbers, nearly equivalent to np.float64. In some
unusual situations it may be useful to use floating-point numbers with more precision. Whether this is possible in numpy
depends on the hardware and on the development environment: specifically, x86 machines provide hardware floating-
point with 80-bit precision, and while most C compilers provide this as their long double type, MSVC (standard for
Windows builds) makes long double identical to double (64 bits). NumPy makes the compiler’s long double
available as [Link] (and [Link] for the complex numbers). You can find out what your numpy
provides with [Link]([Link]).
NumPy does not provide a dtype with more precision than C’s long double\; in particular, the 128-bit IEEE quad
precision data type (FORTRAN’s REAL*16\) is not available.
For efficient memory alignment, [Link] is usually stored padded with zero bits, either to 96 or 128 bits.
Which is more efficient depends on hardware and development environment; typically on 32-bit systems they are padded
to 96 bits, while on 64-bit systems they are typically padded to 128 bits. [Link] is padded to the system
default; np.float96 and np.float128 are provided for users who want specific padding. In spite of the names,

92 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

np.float96 and np.float128 provide only as much precision as [Link], that is, 80 bits on most x86
machines and 64 bits in standard Windows builds.
Be warned that even if [Link] offers more precision than python float, it is easy to lose that extra precision,
since python often forces values to pass through float. For example, the % formatting operator requires its arguments to
be converted to standard python types, and it is therefore impossible to preserve extended precision even if many decimal
places are requested. It can be useful to test your code with the value 1 + [Link]([Link]).eps.

4.5 Broadcasting

See also:
[Link]
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to
certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcast-
ing provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without
making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where
broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.
NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays
must have exactly the same shape, as in the following example:

>>> a = [Link]([1.0, 2.0, 3.0])


>>> b = [Link]([2.0, 2.0, 2.0])
>>> a * b
array([ 2., 4., 6.])

NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broad-
casting example occurs when an array and a scalar value are combined in an operation:

>>> a = [Link]([1.0, 2.0, 3.0])


>>> b = 2.0
>>> a * b
array([ 2., 4., 6.])

The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched
during the arithmetic operation into an array with the same shape as a. The new elements in b, as shown in Figure 1,
are simply copies of the original scalar. The stretching analogy is only conceptual. NumPy is smart enough to use the
original scalar value without actually making copies so that broadcasting operations are as memory and computationally
efficient as possible.
The code in the second example is more efficient than that in the first because broadcasting moves less memory around
during the multiplication (b is a scalar rather than an array).

4.5.1 General Broadcasting Rules

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost)
dimensions and works its way left. Two dimensions are compatible when
1) they are equal, or
2) one of them is 1

4.5. Broadcasting 93
NumPy User Guide, Release 1.22.0

Fig. 1: Figure 1
In the simplest example of broadcasting, the scalar b is stretched to become an array of same shape as a so the shapes are compatible
for element-by-element multiplication.

If these conditions are not met, a ValueError: operands could not be broadcast together excep-
tion is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1
along each axis of the inputs.
Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values,
and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array
with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are
compatible:

Image (3d array): 256 x 256 x 3


Scale (1d array): 3
Result (3d array): 256 x 256 x 3

When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched
or “copied” to match the other.
In the following example, both the A and B arrays have axes with length one that are expanded to a larger size during the
broadcast operation:

A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5

4.5.2 Broadcastable arrays

A set of arrays is called “broadcastable” to the same shape if the above rules produce a valid result.
For example, if [Link] is (5,1), [Link] is (1,6), [Link] is (6,) and [Link] is () so that d is a scalar, then a,
b, c, and d are all broadcastable to dimension (5,6); and
• a acts like a (5,6) array where a[:,0] is broadcast to the other columns,
• b acts like a (5,6) array where b[0,:] is broadcast to the other rows,
• c acts like a (1,6) array and therefore like a (5,6) array where c[:] is broadcast to every row, and finally,
• d acts like a (5,6) array where the single value is repeated.
Here are some more examples:

94 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4

A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4

A (3d array): 15 x 3 x 5
B (3d array): 15 x 1 x 5
Result (3d array): 15 x 3 x 5

A (3d array): 15 x 3 x 5
B (2d array): 3 x 5
Result (3d array): 15 x 3 x 5

A (3d array): 15 x 3 x 5
B (2d array): 3 x 1
Result (3d array): 15 x 3 x 5

Here are examples of shapes that do not broadcast:

A (1d array): 3
B (1d array): 4 # trailing dimensions do not match

A (2d array): 2 x 1
B (3d array): 8 x 4 x 3 # second from last dimensions mismatched

An example of broadcasting when a 1-d array is added to a 2-d array:

>>> a = array([[ 0.0, 0.0, 0.0],


... [10.0, 10.0, 10.0],
... [20.0, 20.0, 20.0],
... [30.0, 30.0, 30.0]])
>>> b = array([1.0, 2.0, 3.0])
>>> a + b
array([[ 1., 2., 3.],
[ 11., 12., 13.],
[ 21., 22., 23.],
[ 31., 32., 33.]])
>>> b = array([1.0, 2.0, 3.0, 4.0])
>>> a + b
Traceback (most recent call last):
ValueError: operands could not be broadcast together with shapes (4,3) (4,)

As shown in Figure 2, b is added to each row of a. In Figure 3, an exception is raised because of the incompatible shapes.
Broadcasting provides a convenient way of taking the outer product (or any other outer operation) of two arrays. The
following example shows an outer addition operation of two 1-d arrays:

>>> a = [Link]([0.0, 10.0, 20.0, 30.0])


>>> b = [Link]([1.0, 2.0, 3.0])
>>> a[:, [Link]] + b
array([[ 1., 2., 3.],
[ 11., 12., 13.],
[ 21., 22., 23.],
[ 31., 32., 33.]])

4.5. Broadcasting 95
NumPy User Guide, Release 1.22.0

Fig. 2: Figure 2
A one dimensional array added to a two dimensional array results in broadcasting if number of 1-d array elements matches the number
of 2-d array columns.

Fig. 3: Figure 3
When the trailing dimensions of the arrays are unequal, broadcasting fails because it is impossible to align the values in the rows of the
1st array with the elements of the 2nd arrays for element-by-element addition.

96 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

Fig. 4: Figure 4
In some cases, broadcasting stretches both arrays to form an output array larger than either of the initial arrays.

Here the newaxis index operator inserts a new axis into a, making it a two-dimensional 4x1 array. Combining the
4x1 array with b, which has shape (3,), yields a 4x3 array.

4.5.3 A Practical Example: Vector Quantization

Broadcasting comes up quite often in real world problems. A typical example occurs in the vector quantization (VQ)
algorithm used in information theory, classification, and other related areas. The basic operation in VQ finds the closest
point in a set of points, called codes in VQ jargon, to a given point, called the observation. In the very simple,
two-dimensional case shown below, the values in observation describe the weight and height of an athlete to be
classified. The codes represent different classes of athletes.1 Finding the closest point requires calculating the distance
between observation and each of the codes. The shortest distance provides the best match. In this example, codes[0]
is the closest class indicating that the athlete is likely a basketball player.

>>> from numpy import array, argmin, sqrt, sum


>>> observation = array([111.0, 188.0])
>>> codes = array([[102.0, 203.0],
... [132.0, 193.0],
... [45.0, 155.0],
... [57.0, 173.0]])
>>> diff = codes - observation # the broadcast happens here
>>> dist = sqrt(sum(diff**2,axis=-1))
>>> argmin(dist)
0

In this example, the observation array is stretched to match the shape of the codes array:

Observation (1d array): 2


Codes (2d array): 4 x 2
Diff (2d array): 4 x 2

1 In this example, weight has more impact on the distance calculation than height because of the larger values. In practice, it is important to normalize

the height and weight, often by their standard deviation across the data set, so that both have equal influence on the distance calculation.

4.5. Broadcasting 97
NumPy User Guide, Release 1.22.0

Fig. 5: Figure 5
The basic operation of vector quantization calculates the distance between an object to be classified, the dark square, and multiple known
codes, the gray circles. In this simple case, the codes represent individual classes. More complex cases use multiple codes per class.

98 4. NumPy fundamentals
NumPy User Guide, Release 1.22.0

Typically, a large number of observations, perhaps read from a database, are compared to a set of codes. Consider
this scenario:

Observation (2d array): 10 x 3


Codes (2d array): 5 x 3
Diff (3d array): 5 x 10 x 3

The three-dimensional array, diff, is a consequence of broadcasting, not a necessity for the calculation. Large data
sets will generate a large intermediate array that is computationally inefficient. Instead, if each observation is calculated
individually using a Python loop around the code in the two-dimensional example above, a much smaller array is used.
Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in
C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm.
In these cases, it is better to write the algorithm’s outer loop in Python. This may also produce more readable code, as
algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast
increases.

4.6 Byte-swapping

4.6.1 Introduction to byte ordering and ndarrays

The ndarray is an object that provide a python array interface to data in memory.
It often happens that the memory that you want to view with an array is not of the same byte ordering as the computer
on which you are running Python.
For example, I might be working on a computer with a little-endian CPU - such as an Intel Pentium, but I have loaded
some data from a file written by a computer that is big-endian. Let’s say I have loaded 4 bytes from a file written by a
Sun (big-endian) computer. I know that these 4 bytes represent two 16-bit integers. On a big-endian machine, a two-byte
integer is stored with the Most Significant Byte (MSB) first, and then the Least Significant Byte (LSB). Thus the bytes
are, in memory order:
1. MSB integer 1
2. LSB integer 1
3. MSB integer 2
4. LSB integer 2
Let’s say the two integers were in fact 1 and 770. Because 770 = 256 * 3 + 2, the 4 bytes in memory would contain
respectively: 0, 1, 3, 2. The bytes I have loaded from the file would have these contents:

>>> big_end_buffer = bytearray([0,1,3,2])


>>> big_end_buffer
bytearray(b'\\x00\\x01\\x03\\x02')

We might want to use an ndarray to access these integers. In that case, we can create an array around this memory,
and tell numpy that there are two integers, and that they are 16 bit and big-endian:

>>> import numpy as np


>>> big_end_arr = [Link](shape=(2,),dtype='>i2', buffer=big_end_buffer)
>>> big_end_arr[0]
1
>>> big_end_arr[1]
770

4.6. Byte-swapping 99
NumPy User Guide, Release 1.22.0

Note the array dtype above of >i2. The > means ‘big-endian’ (< is little-endian) and i2 means ‘signed 2-byte integer’.
For example, if our data represented a single unsigned 4-byte little-endian integer, the dtype string would be <u4.
In fact, why don’t we try that?

>>> little_end_u4 = [Link](shape=(1,),dtype='<u4', buffer=big_end_buffer)


>>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3
True

Returning to our big_end_arr - in this case our underlying data is big-endian (data endianness) and we’ve set the
dtype to match (the dtype is also big-endian). However, sometimes you need to flip these around.

Warning: Scalars currently do not include byte order information, so extracting a scalar from an array will return
an integer in native byte order. Hence:
>>> big_end_arr[0].[Link] == little_end_u4[0].[Link]
True

4.6.2 Changing byte ordering

As you can imagine from the introduction, there are two ways you can affect the relationship between the byte ordering
of the array and the underlying memory it is looking at:
• Change the byte-ordering information in the array dtype so that it interprets the underlying data as being in a
different byte order. This is the role of [Link]()
• Change the byte-ordering of the underlying data, leaving the dtype interpretation as it was. This is what arr.
byteswap() does.
The common situations in which you need to change byte ordering are:
1. Your data and dtype endianness don’t match, and you want to change the dtype so that it matches the data.
2. Your data and dtype endianness don’t match, and you want to swap the data so that they match the dtype
3. Your data and dtype endianness match, but you want the data swapped and the dtype to reflect this

Data and dtype endianness don’t match, change dtype to match data

We make something where they don’t match:

>>> wrong_end_dtype_arr = [Link](shape=(2,),dtype='<i2', buffer=big_end_buffer)


>>> wrong_end_dtype_arr[0]
256

The obvious fix for this situation is to change the dtype so it gives the correct endianness:

>>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder()


>>> fixed_end_dtype_arr[0]
1

Note the array has not changed in memory:

>>> fixed_end_dtype_arr.tobytes() == big_end_buffer


True

100 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

Data and type endianness don’t match, change data to match dtype

You might want to do this if you need the data in memory to be a certain ordering. For example you might be writing the
memory out to a file that needs a certain byte ordering.

>>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap()


>>> fixed_end_mem_arr[0]
1

Now the array has changed in memory:

>>> fixed_end_mem_arr.tobytes() == big_end_buffer


False

Data and dtype endianness match, swap data and dtype

You may have a correctly specified array dtype, but you need the array to have the opposite byte order in memory, and
you want the dtype to match so the array values make sense. In this case you just do both of the previous operations:

>>> swapped_end_arr = big_end_arr.byteswap().newbyteorder()


>>> swapped_end_arr[0]
1
>>> swapped_end_arr.tobytes() == big_end_buffer
False

An easier way of casting the data to a specific dtype and byte ordering can be achieved with the ndarray astype method:

>>> swapped_end_arr = big_end_arr.astype('<i2')


>>> swapped_end_arr[0]
1
>>> swapped_end_arr.tobytes() == big_end_buffer
False

4.7 Structured arrays

4.7.1 Introduction

Structured arrays are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named
fields. For example,

>>> x = [Link]([('Rex', 9, 81.0), ('Fido', 3, 27.0)],


... dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
>>> x
array([('Rex', 9, 81.), ('Fido', 3, 27.)],
dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')])

Here x is a one-dimensional array of length two whose datatype is a structure with three fields: 1. A string of length 10
or less named ‘name’, 2. a 32-bit integer named ‘age’, and 3. a 32-bit float named ‘weight’.
If you index x at position 1 you get a structure:

>>> x[1]
('Fido', 3, 27.0)

4.7. Structured arrays 101


NumPy User Guide, Release 1.22.0

You can access and modify individual fields of a structured array by indexing with the field name:

>>> x['age']
array([9, 3], dtype=int32)
>>> x['age'] = 5
>>> x
array([('Rex', 5, 81.), ('Fido', 5, 27.)],
dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')])

Structured datatypes are designed to be able to mimic ‘structs’ in the C language, and share a similar memory layout. They
are meant for interfacing with C code and for low-level manipulation of structured buffers, for example for interpreting
binary blobs. For these purposes they support specialized features such as subarrays, nested datatypes, and unions, and
allow control over the memory layout of the structure.
Users looking to manipulate tabular data, such as stored in csv files, may find other pydata projects more suitable, such as
xarray, pandas, or DataArray. These provide a high-level interface for tabular data analysis and are better optimized for
that use. For instance, the C-struct-like memory layout of structured arrays in numpy can lead to poor cache behavior in
comparison.

4.7.2 Structured Datatypes

A structured datatype can be thought of as a sequence of bytes of a certain length (the structure’s itemsize) which is
interpreted as a collection of fields. Each field has a name, a datatype, and a byte offset within the structure. The datatype
of a field may be any numpy datatype including other structured datatypes, and it may also be a subarray data type which
behaves like an ndarray of a specified shape. The offsets of the fields are arbitrary, and fields may even overlap. These
offsets are usually determined automatically by numpy, but can also be specified.

Structured Datatype Creation

Structured datatypes may be created using the function [Link]. There are 4 alternative forms of specification
which vary in flexibility and conciseness. These are further documented in the Data Type Objects reference page, and in
summary they are:
1. A list of tuples, one tuple per field
Each tuple has the form (fieldname, datatype, shape) where shape is optional. fieldname is a
string (or tuple if titles are used, see Field Titles below), datatype may be any object convertible to a datatype,
and shape is a tuple of integers specifying subarray shape.

>>> [Link]([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2, 2))])


dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))])

If fieldname is the empty string '', the field will be given a default name of the form f#, where # is the integer
index of the field, counting from 0 from the left:

>>> [Link]([('x', 'f4'), ('', 'i4'), ('z', 'i8')])


dtype([('x', '<f4'), ('f1', '<i4'), ('z', '<i8')])

The byte offsets of the fields within the structure and the total structure itemsize are determined automatically.
2. A string of comma-separated dtype specifications
In this shorthand notation any of the string dtype specifications may be used in a string and separated by commas.
The itemsize and byte offsets of the fields are determined automatically, and the field names are given the default
names f0, f1, etc.

102 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

>>> [Link]('i8, f4, S3')


dtype([('f0', '<i8'), ('f1', '<f4'), ('f2', 'S3')])
>>> [Link]('3int8, float32, (2, 3)float64')
dtype([('f0', 'i1', (3,)), ('f1', '<f4'), ('f2', '<f8', (2, 3))])

3. A dictionary of field parameter arrays


This is the most flexible form of specification since it allows control over the byte-offsets of the fields and the
itemsize of the structure.
The dictionary has two required keys, ‘names’ and ‘formats’, and four optional keys, ‘offsets’, ‘itemsize’, ‘aligned’
and ‘titles’. The values for ‘names’ and ‘formats’ should respectively be a list of field names and a list of dtype
specifications, of the same length. The optional ‘offsets’ value should be a list of integer byte-offsets, one for each
field within the structure. If ‘offsets’ is not given the offsets are determined automatically. The optional ‘itemsize’
value should be an integer describing the total size in bytes of the dtype, which must be large enough to contain all
the fields.

>>> [Link]({'names': ['col1', 'col2'], 'formats': ['i4', 'f4']})


dtype([('col1', '<i4'), ('col2', '<f4')])
>>> [Link]({'names': ['col1', 'col2'],
... 'formats': ['i4', 'f4'],
... 'offsets': [0, 4],
... 'itemsize': 12})
dtype({'names': ['col1', 'col2'], 'formats': ['<i4', '<f4'], 'offsets': [0, 4],
,→'itemsize': 12})

Offsets may be chosen such that the fields overlap, though this will mean that assigning to one field may clobber
any overlapping field’s data. As an exception, fields of numpy.object_ type cannot overlap with other fields,
because of the risk of clobbering the internal object pointer and then dereferencing it.
The optional ‘aligned’ value can be set to True to make the automatic offset computation use aligned offsets (see
Automatic Byte Offsets and Alignment), as if the ‘align’ keyword argument of [Link] had been set to True.
The optional ‘titles’ value should be a list of titles of the same length as ‘names’, see Field Titles below.
4. A dictionary of field names
The use of this form of specification is discouraged, but documented here because older numpy code may use it.
The keys of the dictionary are the field names and the values are tuples specifying type and offset:

>>> [Link]({'col1': ('i1', 0), 'col2': ('f4', 1)})


dtype([('col1', 'i1'), ('col2', '<f4')])

This form is discouraged because Python dictionaries do not preserve order in Python versions before Python 3.6,
and the order of the fields in a structured dtype has meaning. Field Titles may be specified by using a 3-tuple, see
below.

Manipulating and Displaying Structured Datatypes

The list of field names of a structured datatype can be found in the names attribute of the dtype object:

>>> d = [Link]([('x', 'i8'), ('y', 'f4')])


>>> [Link]
('x', 'y')

The field names may be modified by assigning to the names attribute using a sequence of strings of the same length.

4.7. Structured arrays 103


NumPy User Guide, Release 1.22.0

The dtype object also has a dictionary-like attribute, fields, whose keys are the field names (and Field Titles, see below)
and whose values are tuples containing the dtype and byte offset of each field.

>>> [Link]
mappingproxy({'x': (dtype('int64'), 0), 'y': (dtype('float32'), 8)})

Both the names and fields attributes will equal None for unstructured arrays. The recommended way to test if a
dtype is structured is with if [Link] is not None rather than if [Link], to account for dtypes with 0 fields.
The string representation of a structured datatype is shown in the “list of tuples” form if possible, otherwise numpy falls
back to using the more general dictionary form.

Automatic Byte Offsets and Alignment

Numpy uses one of two methods to automatically determine the field byte offsets and the overall itemsize of a structured
datatype, depending on whether align=True was specified as a keyword argument to [Link].
By default (align=False), numpy will pack the fields together such that each field starts at the byte offset the previous
field ended, and the fields are contiguous in memory.

>>> def print_offsets(d):


... print("offsets:", [[Link][name][1] for name in [Link]])
... print("itemsize:", [Link])
>>> print_offsets([Link]('u1, u1, i4, u1, i8, u2'))
offsets: [0, 1, 2, 6, 7, 15]
itemsize: 17

If align=True is set, numpy will pad the structure in the same way many C compilers would pad a C-struct. Aligned
structures can give a performance improvement in some cases, at the cost of increased datatype size. Padding bytes are
inserted between fields such that each field’s byte offset will be a multiple of that field’s alignment, which is usually equal to
the field’s size in bytes for simple datatypes, see PyArray_Descr.alignment. The structure will also have trailing
padding added so that its itemsize is a multiple of the largest field’s alignment.

>>> print_offsets([Link]('u1, u1, i4, u1, i8, u2', align=True))


offsets: [0, 1, 4, 8, 16, 24]
itemsize: 32

Note that although almost all modern C compilers pad in this way by default, padding in C structs is C-implementation-
dependent so this memory layout is not guaranteed to exactly match that of a corresponding struct in a C program. Some
work may be needed, either on the numpy side or the C side, to obtain exact correspondence.
If offsets were specified using the optional offsets key in the dictionary-based dtype specification, setting
align=True will check that each field’s offset is a multiple of its size and that the itemsize is a multiple of the largest
field size, and raise an exception if not.
If the offsets of the fields and itemsize of a structured array satisfy the alignment conditions, the array will have the
ALIGNED flag set.
A convenience function [Link].repack_fields converts an aligned dtype or array to a
packed one and vice versa. It takes either a dtype or structured ndarray as an argument, and returns a copy with fields
re-packed, with or without padding bytes.

104 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

Field Titles

In addition to field names, fields may also have an associated title, an alternate name, which is sometimes used as an
additional description or alias for the field. The title may be used to index an array, just like a field name.
To add titles when using the list-of-tuples form of dtype specification, the field name may be specified as a tuple of two
strings instead of a single string, which will be the field’s title and field name respectively. For example:

>>> [Link]([(('my title', 'name'), 'f4')])


dtype([(('my title', 'name'), '<f4')])

When using the first form of dictionary-based specification, the titles may be supplied as an extra 'titles' key as de-
scribed above. When using the second (discouraged) dictionary-based specification, the title can be supplied by providing
a 3-element tuple (datatype, offset, title) instead of the usual 2-element tuple:

>>> [Link]({'name': ('i4', 0, 'my title')})


dtype([(('my title', 'name'), '<i4')])

The [Link] dictionary will contain titles as keys, if any titles are used. This means effectively that a field with
a title will be represented twice in the fields dictionary. The tuple values for these fields will also have a third element, the
field title. Because of this, and because the names attribute preserves the field order while the fields attribute may
not, it is recommended to iterate through the fields of a dtype using the names attribute of the dtype, which will not list
titles, as in:

>>> for name in [Link]:


... print([Link][name][:2])
(dtype('int64'), 0)
(dtype('float32'), 8)

Union types

Structured datatypes are implemented in numpy to have base type [Link] by default, but it is possible to interpret
other numpy types as structured types using the (base_dtype, dtype) form of dtype specification described in
Data Type Objects. Here, base_dtype is the desired underlying dtype, and fields and flags will be copied from dtype.
This dtype is similar to a ‘union’ in C.

4.7.3 Indexing and Assignment to Structured arrays

Assigning data to a Structured Array

There are a number of ways to assign values to a structured array: Using python tuples, using scalar values, or using other
structured arrays.

Assignment from Python Native Types (Tuples)


The simplest way to assign values to a structured array is using python tuples. Each assigned value should be a tuple of
length equal to the number of fields in the array, and not a list or array as these will trigger numpy’s broadcasting rules.
The tuple’s elements are assigned to the successive fields of the array, from left to right:

>>> x = [Link]([(1, 2, 3), (4, 5, 6)], dtype='i8, f4, f8')


>>> x[1] = (7, 8, 9)
>>> x
array([(1, 2., 3.), (7, 8., 9.)],
dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '<f8')])

4.7. Structured arrays 105


NumPy User Guide, Release 1.22.0

Assignment from Scalars


A scalar assigned to a structured element will be assigned to all fields. This happens when a scalar is assigned to a
structured array, or when an unstructured array is assigned to a structured array:

>>> x = [Link](2, dtype='i8, f4, ?, S1')


>>> x[:] = 3
>>> x
array([(3, 3., True, b'3'), (3, 3., True, b'3')],
dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')])
>>> x[:] = [Link](2)
>>> x
array([(0, 0., False, b'0'), (1, 1., True, b'1')],
dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')])

Structured arrays can also be assigned to unstructured arrays, but only if the structured datatype has just a single field:

>>> twofield = [Link](2, dtype=[('A', 'i4'), ('B', 'i4')])


>>> onefield = [Link](2, dtype=[('A', 'i4')])
>>> nostruct = [Link](2, dtype='i4')
>>> nostruct[:] = twofield
Traceback (most recent call last):
...
TypeError: Cannot cast array data from dtype([('A', '<i4'), ('B', '<i4')]) to dtype(
,→'int32') according to the rule 'unsafe'

Assignment from other Structured Arrays


Assignment between two structured arrays occurs as if the source elements had been converted to tuples and then assigned
to the destination elements. That is, the first field of the source array is assigned to the first field of the destination array, and
the second field likewise, and so on, regardless of field names. Structured arrays with a different number of fields cannot
be assigned to each other. Bytes of the destination structure which are not included in any of the fields are unaffected.

>>> a = [Link](3, dtype=[('a', 'i8'), ('b', 'f4'), ('c', 'S3')])


>>> b = [Link](3, dtype=[('x', 'f4'), ('y', 'S3'), ('z', 'O')])
>>> b[:] = a
>>> b
array([(0., b'0.0', b''), (0., b'0.0', b''), (0., b'0.0', b'')],
dtype=[('x', '<f4'), ('y', 'S3'), ('z', 'O')])

Assignment involving subarrays


When assigning to fields which are subarrays, the assigned value will first be broadcast to the shape of the subarray.

Indexing Structured Arrays

Accessing Individual Fields


Individual fields of a structured array may be accessed and modified by indexing the array with the field name.

>>> x = [Link]([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')])


>>> x['foo']
array([1, 3])
>>> x['foo'] = 10
>>> x
array([(10, 2.), (10, 4.)],
dtype=[('foo', '<i8'), ('bar', '<f4')])

106 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

The resulting array is a view into the original array. It shares the same memory locations and writing to the view will
modify the original array.
>>> y = x['bar']
>>> y[:] = 11
>>> x
array([(10, 11.), (10, 11.)],
dtype=[('foo', '<i8'), ('bar', '<f4')])

This view has the same dtype and itemsize as the indexed field, so it is typically a non-structured array, except in the case
of nested structures.
>>> [Link], [Link], [Link]
(dtype('float32'), (2,), (12,))

If the accessed field is a subarray, the dimensions of the subarray are appended to the shape of the result:
>>> x = [Link]((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3))])
>>> x['a'].shape
(2, 2)
>>> x['b'].shape
(2, 2, 3, 3)

Accessing Multiple Fields


One can index and assign to a structured array with a multi-field index, where the index is a list of field names.

Warning: The behavior of multi-field indexes changed from Numpy 1.15 to Numpy 1.16.

The result of indexing with a multi-field index is a view into the original array, as follows:
>>> a = [Link](3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')])
>>> a[['a', 'c']]
array([(0, 0.), (0, 0.), (0, 0.)],
dtype={'names':['a','c'], 'formats':['<i4','<f4'], 'offsets':[0,8], 'itemsize
,→':12})

Assignment to the view modifies the original array. The view’s fields will be in the order they were indexed. Note that
unlike for single-field indexing, the dtype of the view has the same itemsize as the original array, and has fields at the
same offsets as in the original array, and unindexed fields are merely missing.

Warning: In Numpy 1.15, indexing an array with a multi-field index returned a copy of the result above, but with
fields packed together in memory as if passed through [Link].repack_fields.
The new behavior as of Numpy 1.16 leads to extra “padding” bytes at the location of unindexed fields compared to
1.15. You will need to update any code which depends on the data having a “packed” layout. For instance code such
as:
>>> a[['a', 'c']].view('i8') # Fails in Numpy 1.16
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: When changing to a smaller dtype, its size must be a divisor of the␣
,→size of original dtype

will need to be changed. This code has raised a FutureWarning since Numpy 1.12, and similar code has raised
FutureWarning since 1.7.

4.7. Structured arrays 107


NumPy User Guide, Release 1.22.0

In 1.16 a number of functions have been introduced in the [Link] module


to help users account for this change. These are [Link].repack_fields.
[Link].structured_to_unstructured, [Link].
unstructured_to_structured, [Link].apply_along_fields,
[Link].assign_fields_by_name, and [Link].
require_fields.
The function [Link].repack_fields can always be used to reproduce the old behav-
ior, as it will return a packed copy of the structured array. The code above, for example, can be replaced with:
>>> from [Link] import repack_fields
>>> repack_fields(a[['a', 'c']]).view('i8') # supported in 1.16
array([0, 0, 0])

Furthermore, numpy now provides a new function [Link].


structured_to_unstructured which is a safer and more efficient alternative for users who wish to
convert structured arrays to unstructured arrays, as the view above is often indeded to do. This function allows safe
conversion to an unstructured type taking into account padding, often avoids a copy, and also casts the datatypes as
needed, unlike the view. Code such as:
>>> b = [Link](3, dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
>>> b[['x', 'z']].view('f4')
array([0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

can be made safer by replacing with:


>>> from [Link] import structured_to_unstructured
>>> structured_to_unstructured(b[['x', 'z']])
array([0, 0, 0])

Assignment to an array with a multi-field index modifies the original array:

>>> a[['a', 'c']] = (2, 3)


>>> a
array([(2, 0, 3.), (2, 0, 3.), (2, 0, 3.)],
dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<f4')])

This obeys the structured array assignment rules described above. For example, this means that one can swap the values
of two fields using appropriate multi-field indexes:

>>> a[['a', 'c']] = a[['c', 'a']]

Indexing with an Integer to get a Structured Scalar


Indexing a single element of a structured array (with an integer index) returns a structured scalar:

>>> x = [Link]([(1, 2., 3.)], dtype='i, f, f')


>>> scalar = x[0]
>>> scalar
(1, 2., 3.)
>>> type(scalar)
<class '[Link]'>

Unlike other numpy scalars, structured scalars are mutable and act like views into the original array, such that modifying
the scalar will modify the original array. Structured scalars also support access and assignment by field name:

108 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

>>> x = [Link]([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')])


>>> s = x[0]
>>> s['bar'] = 100
>>> x
array([(1, 100.), (3, 4.)],
dtype=[('foo', '<i8'), ('bar', '<f4')])

Similarly to tuples, structured scalars can also be indexed with an integer:

>>> scalar = [Link]([(1, 2., 3.)], dtype='i, f, f')[0]


>>> scalar[0]
1
>>> scalar[1] = 4

Thus, tuples might be thought of as the native Python equivalent to numpy’s structured types, much like native python
integers are the equivalent to numpy’s integer types. Structured scalars may be converted to a tuple by calling numpy.
[Link]:

>>> [Link](), type([Link]())


((1, 4.0, 3.0), <class 'tuple'>)

Viewing Structured Arrays Containing Objects

In order to prevent clobbering object pointers in fields of object type, numpy currently does not allow views of structured
arrays containing objects.

Structure Comparison

If the dtypes of two void structured arrays are equal, testing the equality of the arrays will result in a boolean array with
the dimensions of the original arrays, with elements set to True where all fields of the corresponding structures are equal.
Structured dtypes are equal if the field names, dtypes and titles are the same, ignoring endianness, and the fields are in
the same order:

>>> a = [Link](2, dtype=[('a', 'i4'), ('b', 'i4')])


>>> b = [Link](2, dtype=[('a', 'i4'), ('b', 'i4')])
>>> a == b
array([False, False])

Currently, if the dtypes of two void structured arrays are not equivalent the comparison fails, returning the scalar value
False. This behavior is deprecated as of numpy 1.10 and will raise an error or perform elementwise comparison in the
future.
The < and > operators always return False when comparing void structured arrays, and arithmetic and bitwise operations
are not supported.

4.7. Structured arrays 109


NumPy User Guide, Release 1.22.0

4.7.4 Record Arrays

As an optional convenience numpy provides an ndarray subclass, [Link] that allows access to fields of
structured arrays by attribute instead of only by index. Record arrays use a special datatype, [Link], that
allows field access by attribute on the structured scalars obtained from the array. The [Link] module provides
functions for creating recarrays from various objects. Additional helper functions for creating and manipulating structured
arrays can be found in [Link].
The simplest way to create a record array is with [Link]:

>>> recordarr = [Link]([(1, 2., 'Hello'), (2, 3., "World")],


... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
>>> [Link]
array([ 2., 3.], dtype=float32)
>>> recordarr[1:2]
[Link]([(2, 3., b'World')],
dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])
>>> recordarr[1:2].foo
array([2], dtype=int32)
>>> [Link][1:2]
array([2], dtype=int32)
>>> recordarr[1].baz
b'World'

[Link] can convert a wide variety of arguments into record arrays, including structured arrays:

>>> arr = [Link]([(1, 2., 'Hello'), (2, 3., "World")],


... dtype=[('foo', 'i4'), ('bar', 'f4'), ('baz', 'S10')])
>>> recordarr = [Link](arr)

The [Link] module provides a number of other convenience functions for creating record arrays, see record array
creation routines.
A record array representation of a structured array can be obtained using the appropriate view:

>>> arr = [Link]([(1, 2., 'Hello'), (2, 3., "World")],


... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'a10')])
>>> recordarr = [Link](dtype=[Link](([Link], [Link])),
... type=[Link])

For convenience, viewing an ndarray as type [Link] will automatically convert to [Link]
datatype, so the dtype can be left out of the view:

>>> recordarr = [Link]([Link])


>>> [Link]
dtype(([Link], [('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')]))

To get back to a plain ndarray both the dtype and type must be reset. The following view does so, taking into account the
unusual case that the recordarr was not a structured type:

>>> arr2 = [Link]([Link] or [Link], [Link])

Record array fields accessed by index or by attribute are returned as a record array if the field has a structured type but
as a plain ndarray otherwise.

>>> recordarr = [Link]([('Hello', (1, 2)), ("World", (3, 4))],


... dtype=[('foo', 'S6'),('bar', [('A', int), ('B', int)])])
(continues on next page)

110 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

(continued from previous page)


>>> type([Link])
<class '[Link]'>
>>> type([Link])
<class '[Link]'>

Note that if a field has the same name as an ndarray attribute, the ndarray attribute takes precedence. Such fields will be
inaccessible by attribute but will still be accessible by index.

Recarray Helper Functions

Collection of utilities to manipulate structured arrays.


Most of these functions were initially implemented by John Hunter for matplotlib. They have been rewritten and extended
for convenience.
[Link].append_fields(base, names, data, dtypes=None, fill_value=- 1,
usemask=True, asrecarray=False)
Add new fields to an existing array.
The names of the fields are given with the names arguments, the corresponding values with the data arguments. If
a single field is appended, names, data and dtypes do not have to be lists but just values.
Parameters

base
[array] Input array to extend.
names
[string, sequence] String or sequence of strings corresponding to the names of the new fields.
data
[array or sequence of arrays] Array or sequence of arrays storing the fields to add to the base.
dtypes
[sequence of datatypes, optional] Datatype or sequence of datatypes. If None, the datatypes
are estimated from the data.
fill_value
[{float}, optional] Filling value used to pad missing data on the shorter arrays.
usemask
[{False, True}, optional] Whether to return a masked array or not.
asrecarray
[{False, True}, optional] Whether to return a recarray (MaskedRecords) or not.

[Link].apply_along_fields(func, arr)
Apply function ‘func’ as a reduction across fields of a structured array.
This is similar to apply_along_axis, but treats the fields of a structured array as an extra axis. The fields are all first
cast to a common type following the type-promotion rules from numpy.result_type applied to the field’s
dtypes.
Parameters

4.7. Structured arrays 111


NumPy User Guide, Release 1.22.0

func
[function] Function to apply on the “field” dimension. This function must support an axis
argument, like [Link], [Link], etc.
arr
[ndarray] Structured array for which to apply func.

Returns

out
[ndarray] Result of the recution operation

Examples

>>> from [Link] import recfunctions as rfn


>>> b = [Link]([(1, 2, 5), (4, 5, 7), (7, 8 ,11), (10, 11, 12)],
... dtype=[('x', 'i4'), ('y', 'f4'), ('z', 'f8')])
>>> rfn.apply_along_fields([Link], b)
array([ 2.66666667, 5.33333333, 8.66666667, 11. ])
>>> rfn.apply_along_fields([Link], b[['x', 'z']])
array([ 3. , 5.5, 9. , 11. ])

[Link].assign_fields_by_name(dst, src, zero_unassigned=True)


Assigns values from one structured array to another by field name.
Normally in numpy >= 1.14, assignment of one structured array to another copies fields “by position”, meaning
that the first field from the src is copied to the first field of the dst, and so on, regardless of field name.
This function instead copies “by field name”, such that fields in the dst are assigned from the identically named field
in the src. This applies recursively for nested structures. This is how structure assignment worked in numpy >= 1.6
to <= 1.13.
Parameters

dst
[ndarray]
src
[ndarray] The source and destination arrays during assignment.
zero_unassigned
[bool, optional] If True, fields in the dst for which there was no matching field in the src are
filled with the value 0 (zero). This was the behavior of numpy <= 1.13. If False, those fields
are not modified.

[Link].drop_fields(base, drop_names, usemask=True, asrecarray=False)


Return a new array with fields in drop_names dropped.
Nested fields are supported.
Changed in version 1.18.0: drop_fields returns an array with 0 fields if all fields are dropped, rather than
returning None as it did previously.
Parameters

112 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

base
[array] Input array
drop_names
[string or sequence] String or sequence of strings corresponding to the names of the fields to
drop.
usemask
[{False, True}, optional] Whether to return a masked array or not.
asrecarray
[string or sequence, optional] Whether to return a recarray or a mrecarray (asrecarray=True)
or a plain ndarray or masked array with flexible dtype. The default is False.

Examples

>>> from [Link] import recfunctions as rfn


>>> a = [Link]([(1, (2, 3.0)), (4, (5, 6.0))],
... dtype=[('a', np.int64), ('b', [('ba', [Link]), ('bb', np.int64)])])
>>> rfn.drop_fields(a, 'a')
array([((2., 3),), ((5., 6),)],
dtype=[('b', [('ba', '<f8'), ('bb', '<i8')])])
>>> rfn.drop_fields(a, 'ba')
array([(1, (3,)), (4, (6,))], dtype=[('a', '<i8'), ('b', [('bb', '<i8')])])
>>> rfn.drop_fields(a, ['ba', 'bb'])
array([(1,), (4,)], dtype=[('a', '<i8')])

[Link].find_duplicates(a, key=None, ignoremask=True, return_index=False)


Find the duplicates in a structured array along a given key
Parameters

a
[array-like] Input array
key
[{string, None}, optional] Name of the fields along which to check the duplicates. If None,
the search is performed by records
ignoremask
[{True, False}, optional] Whether masked data should be discarded or considered as dupli-
cates.
return_index
[{False, True}, optional] Whether to return the indices of the duplicated values.

4.7. Structured arrays 113


NumPy User Guide, Release 1.22.0

Examples

>>> from [Link] import recfunctions as rfn


>>> ndtype = [('a', int)]
>>> a = [Link]([1, 1, 1, 2, 2, 3, 3],
... mask=[0, 0, 1, 0, 0, 0, 1]).view(ndtype)
>>> rfn.find_duplicates(a, ignoremask=True, return_index=True)
(masked_array(data=[(1,), (1,), (2,), (2,)],
mask=[(False,), (False,), (False,), (False,)],
fill_value=(999999,),
dtype=[('a', '<i8')]), array([0, 1, 3, 4]))

[Link].flatten_descr(ndtype)
Flatten a structured data-type description.

Examples

>>> from [Link] import recfunctions as rfn


>>> ndtype = [Link]([('a', '<i4'), ('b', [('ba', '<f8'), ('bb', '<i4')])])
>>> rfn.flatten_descr(ndtype)
(('a', dtype('int32')), ('ba', dtype('float64')), ('bb', dtype('int32')))

[Link].get_fieldstructure(adtype, lastname=None, parents=None)


Returns a dictionary with fields indexing lists of their parent fields.
This function is used to simplify access to fields nested in other fields.
Parameters

adtype
[[Link]] Input datatype
lastname
[optional] Last processed field name (used internally during recursion).
parents
[dictionary] Dictionary of parent fields (used interbally during recursion).

Examples

>>> from [Link] import recfunctions as rfn


>>> ndtype = [Link]([('A', int),
... ('B', [('BA', int),
... ('BB', [('BBA', int), ('BBB', int)])])])
>>> rfn.get_fieldstructure(ndtype)
... # XXX: possible regression, order of BBA and BBB is swapped
{'A': [], 'B': [], 'BA': ['B'], 'BB': ['B'], 'BBA': ['B', 'BB'], 'BBB': ['B', 'BB
,→']}

[Link].get_names(adtype)
Returns the field names of the input datatype as a tuple.
Parameters

114 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

adtype
[dtype] Input datatype

Examples

>>> from [Link] import recfunctions as rfn


>>> rfn.get_names([Link]((1,), dtype=int))
Traceback (most recent call last):
...
AttributeError: '[Link]' object has no attribute 'names'

>>> rfn.get_names([Link]((1,), dtype=[('A',int), ('B', float)]))


Traceback (most recent call last):
...
AttributeError: '[Link]' object has no attribute 'names'
>>> adtype = [Link]([('a', int), ('b', [('ba', int), ('bb', int)])])
>>> rfn.get_names(adtype)
('a', ('b', ('ba', 'bb')))

[Link].get_names_flat(adtype)
Returns the field names of the input datatype as a tuple. Nested structure are flattened beforehand.
Parameters

adtype
[dtype] Input datatype

Examples

>>> from [Link] import recfunctions as rfn


>>> rfn.get_names_flat([Link]((1,), dtype=int)) is None
Traceback (most recent call last):
...
AttributeError: '[Link]' object has no attribute 'names'
>>> rfn.get_names_flat([Link]((1,), dtype=[('A',int), ('B', float)]))
Traceback (most recent call last):
...
AttributeError: '[Link]' object has no attribute 'names'
>>> adtype = [Link]([('a', int), ('b', [('ba', int), ('bb', int)])])
>>> rfn.get_names_flat(adtype)
('a', 'b', 'ba', 'bb')

[Link].join_by(key, r1, r2, jointype=’inner’, r1postfix=’1’, r2postfix=’2’, defaults=None,


usemask=True, asrecarray=False)
Join arrays r1 and r2 on key key.
The key should be either a string or a sequence of string corresponding to the fields used to join the array. An
exception is raised if the key field cannot be found in the two input arrays. Neither r1 nor r2 should have any
duplicates along key: the presence of duplicates will make the output quite unreliable. Note that duplicates are not
looked for by the algorithm.
Parameters

4.7. Structured arrays 115


NumPy User Guide, Release 1.22.0

key
[{string, sequence}] A string or a sequence of strings corresponding to the fields used for
comparison.
r1, r2
[arrays] Structured arrays.
jointype
[{‘inner’, ‘outer’, ‘leftouter’}, optional] If ‘inner’, returns the elements common to both r1 and
r2. If ‘outer’, returns the common elements as well as the elements of r1 not in r2 and the
elements of not in r2. If ‘leftouter’, returns the common elements and the elements of r1 not
in r2.
r1postfix
[string, optional] String appended to the names of the fields of r1 that are present in r2 but
absent of the key.
r2postfix
[string, optional] String appended to the names of the fields of r2 that are present in r1 but
absent of the key.
defaults
[{dictionary}, optional] Dictionary mapping field names to the corresponding default values.
usemask
[{True, False}, optional] Whether to return a MaskedArray (or MaskedRecords is asrecar-
ray==True) or a ndarray.
asrecarray
[{False, True}, optional] Whether to return a recarray (or MaskedRecords if usemask==True)
or just a flexible-type ndarray.

Notes

• The output is sorted along the key.


• A temporary array is formed by dropping the fields not in the key for the two arrays and concatenating the
result. This array is then sorted, and the common entries selected. The output is constructed by filling the
fields with the selected entries. Matching is not preserved if there are some duplicates…

[Link].merge_arrays(seqarrays, fill_value=- 1, flatten=False, usemask=False,


asrecarray=False)
Merge arrays field by field.
Parameters

seqarrays
[sequence of ndarrays] Sequence of arrays
fill_value
[{float}, optional] Filling value used to pad missing data on the shorter arrays.

116 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

flatten
[{False, True}, optional] Whether to collapse nested fields.
usemask
[{False, True}, optional] Whether to return a masked array or not.
asrecarray
[{False, True}, optional] Whether to return a recarray (MaskedRecords) or not.

Notes

• Without a mask, the missing value will be filled with something, depending on what its corresponding type:
– -1 for integers
– -1.0 for floating point numbers
– '-' for characters
– '-1' for strings
– True for boolean values
• XXX: I just obtained these values empirically

Examples

>>> from [Link] import recfunctions as rfn


>>> rfn.merge_arrays(([Link]([1, 2]), [Link]([10., 20., 30.])))
array([( 1, 10.), ( 2, 20.), (-1, 30.)],
dtype=[('f0', '<i8'), ('f1', '<f8')])

>>> rfn.merge_arrays(([Link]([1, 2], dtype=np.int64),


... [Link]([10., 20., 30.])), usemask=False)
array([(1, 10.0), (2, 20.0), (-1, 30.0)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
>>> rfn.merge_arrays(([Link]([1, 2]).view([('a', np.int64)]),
... [Link]([10., 20., 30.])),
... usemask=False, asrecarray=True)
[Link]([( 1, 10.), ( 2, 20.), (-1, 30.)],
dtype=[('a', '<i8'), ('f1', '<f8')])

[Link].rec_append_fields(base, names, data, dtypes=None)


Add new fields to an existing array.
The names of the fields are given with the names arguments, the corresponding values with the data arguments. If
a single field is appended, names, data and dtypes do not have to be lists but just values.
Parameters

base
[array] Input array to extend.
names
[string, sequence] String or sequence of strings corresponding to the names of the new fields.

4.7. Structured arrays 117


NumPy User Guide, Release 1.22.0

data
[array or sequence of arrays] Array or sequence of arrays storing the fields to add to the base.
dtypes
[sequence of datatypes, optional] Datatype or sequence of datatypes. If None, the datatypes
are estimated from the data.

Returns

appended_array
[[Link]]

See also:

append_fields

[Link].rec_drop_fields(base, drop_names)
Returns a new [Link] with fields in drop_names dropped.
[Link].rec_join(key, r1, r2, jointype=’inner’, r1postfix=’1’, r2postfix=’2’,
defaults=None)
Join arrays r1 and r2 on keys. Alternative to join_by, that always returns a [Link].
See also:

join_by
equivalent function

[Link].recursive_fill_fields(input, output)
Fills fields from output with fields from input, with support for nested structures.
Parameters

input
[ndarray] Input array.
output
[ndarray] Output array.

Notes

• output should be at least the same size as input

118 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

Examples

>>> from [Link] import recfunctions as rfn


>>> a = [Link]([(1, 10.), (2, 20.)], dtype=[('A', np.int64), ('B', np.float64)])
>>> b = [Link]((3,), dtype=[Link])
>>> rfn.recursive_fill_fields(a, b)
array([(1, 10.), (2, 20.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])

[Link].rename_fields(base, namemapper)
Rename the fields from a flexible-datatype ndarray or recarray.
Nested fields are supported.
Parameters

base
[ndarray] Input array whose fields must be modified.
namemapper
[dictionary] Dictionary mapping old field names to their new version.

Examples

>>> from [Link] import recfunctions as rfn


>>> a = [Link]([(1, (2, [3.0, 30.])), (4, (5, [6.0, 60.]))],
... dtype=[('a', int),('b', [('ba', float), ('bb', (float, 2))])])
>>> rfn.rename_fields(a, {'a':'A', 'bb':'BB'})
array([(1, (2., [ 3., 30.])), (4, (5., [ 6., 60.]))],
dtype=[('A', '<i8'), ('b', [('ba', '<f8'), ('BB', '<f8', (2,))])])

[Link].repack_fields(a, align=False, recurse=False)


Re-pack the fields of a structured array or dtype in memory.
The memory layout of structured datatypes allows fields at arbitrary byte offsets. This means the fields can be
separated by padding bytes, their offsets can be non-monotonically increasing, and they can overlap.
This method removes any overlaps and reorders the fields in memory so they have increasing byte offsets, and adds
or removes padding bytes depending on the align option, which behaves like the align option to [Link].
If align=False, this method produces a “packed” memory layout in which each field starts at the byte the previous
field ended, and any padding bytes are removed.
If align=True, this methods produces an “aligned” memory layout in which each field’s offset is a multiple of its
alignment, and the total itemsize is a multiple of the largest alignment, by adding padding bytes as needed.
Parameters

a
[ndarray or dtype] array or dtype for which to repack the fields.
align
[boolean] If true, use an “aligned” memory layout, otherwise use a “packed” layout.
recurse
[boolean] If True, also repack nested structures.

4.7. Structured arrays 119


NumPy User Guide, Release 1.22.0

Returns

repacked
[ndarray or dtype] Copy of a with fields repacked, or a itself if no repacking was needed.

Examples

>>> from [Link] import recfunctions as rfn


>>> def print_offsets(d):
... print("offsets:", [[Link][name][1] for name in [Link]])
... print("itemsize:", [Link])
...
>>> dt = [Link]('u1, <i8, <f8', align=True)
>>> dt
dtype({'names': ['f0', 'f1', 'f2'], 'formats': ['u1', '<i8', '<f8'], 'offsets':␣
,→[0, 8, 16], 'itemsize': 24}, align=True)

>>> print_offsets(dt)
offsets: [0, 8, 16]
itemsize: 24
>>> packed_dt = rfn.repack_fields(dt)
>>> packed_dt
dtype([('f0', 'u1'), ('f1', '<i8'), ('f2', '<f8')])
>>> print_offsets(packed_dt)
offsets: [0, 1, 9]
itemsize: 17

[Link].require_fields(array, required_dtype)
Casts a structured array to a new dtype using assignment by field-name.
This function assigns from the old to the new array by name, so the value of a field in the output array is the value
of the field with the same name in the source array. This has the effect of creating a new ndarray containing only
the fields “required” by the required_dtype.
If a field name in the required_dtype does not exist in the input array, that field is created and set to 0 in the output
array.
Parameters

a
[ndarray] array to cast
required_dtype
[dtype] datatype for output array

Returns

out
[ndarray] array with the new dtype, with field values copied from the fields in the input array
with the same name

120 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

Examples

>>> from [Link] import recfunctions as rfn


>>> a = [Link](4, dtype=[('a', 'i4'), ('b', 'f8'), ('c', 'u1')])
>>> rfn.require_fields(a, [('b', 'f4'), ('c', 'u1')])
array([(1., 1), (1., 1), (1., 1), (1., 1)],
dtype=[('b', '<f4'), ('c', 'u1')])
>>> rfn.require_fields(a, [('b', 'f4'), ('newf', 'u1')])
array([(1., 0), (1., 0), (1., 0), (1., 0)],
dtype=[('b', '<f4'), ('newf', 'u1')])

[Link].stack_arrays(arrays, defaults=None, usemask=True, asrecarray=False,


autoconvert=False)
Superposes arrays fields by fields
Parameters

arrays
[array or sequence] Sequence of input arrays.
defaults
[dictionary, optional] Dictionary mapping field names to the corresponding default values.
usemask
[{True, False}, optional] Whether to return a MaskedArray (or MaskedRecords is asrecar-
ray==True) or a ndarray.
asrecarray
[{False, True}, optional] Whether to return a recarray (or MaskedRecords if usemask==True)
or just a flexible-type ndarray.
autoconvert
[{False, True}, optional] Whether automatically cast the type of the field to the maximum.

Examples

>>> from [Link] import recfunctions as rfn


>>> x = [Link]([1, 2,])
>>> rfn.stack_arrays(x) is x
True
>>> z = [Link]([('A', 1), ('B', 2)], dtype=[('A', '|S3'), ('B', float)])
>>> zz = [Link]([('a', 10., 100.), ('b', 20., 200.), ('c', 30., 300.)],
... dtype=[('A', '|S3'), ('B', [Link]), ('C', [Link])])
>>> test = rfn.stack_arrays((z,zz))
>>> test
masked_array(data=[(b'A', 1.0, --), (b'B', 2.0, --), (b'a', 10.0, 100.0),
(b'b', 20.0, 200.0), (b'c', 30.0, 300.0)],
mask=[(False, False, True), (False, False, True),
(False, False, False), (False, False, False),
(False, False, False)],
fill_value=(b'N/A', 1.e+20, 1.e+20),
dtype=[('A', 'S3'), ('B', '<f8'), ('C', '<f8')])

4.7. Structured arrays 121


NumPy User Guide, Release 1.22.0

[Link].structured_to_unstructured(arr, dtype=None, copy=False,


casting=’unsafe’)
Converts an n-D structured array into an (n+1)-D unstructured array.
The new array will have a new last dimension equal in size to the number of field-elements of the input array. If not
supplied, the output datatype is determined from the numpy type promotion rules applied to all the field datatypes.
Nested fields, as well as each element of any subarray fields, all count as a single field-elements.
Parameters

arr
[ndarray] Structured array or dtype to convert. Cannot contain object datatype.
dtype
[dtype, optional] The dtype of the output unstructured array.
copy
[bool, optional] See copy argument to [Link]. If true, always return a copy. If
false, and dtype requirements are satisfied, a view is returned.
casting
[{‘no’, ‘equiv’, ‘safe’, ‘same_kind’, ‘unsafe’}, optional] See casting argument of ndarray.
astype. Controls what kind of data casting may occur.

Returns

unstructured
[ndarray] Unstructured array with one more dimension.

Examples

>>> from [Link] import recfunctions as rfn


>>> a = [Link](4, dtype=[('a', 'i4'), ('b', 'f4,u2'), ('c', 'f4', 2)])
>>> a
array([(0, (0., 0), [0., 0.]), (0, (0., 0), [0., 0.]),
(0, (0., 0), [0., 0.]), (0, (0., 0), [0., 0.])],
dtype=[('a', '<i4'), ('b', [('f0', '<f4'), ('f1', '<u2')]), ('c', '<f4', (2,
,→))])

>>> rfn.structured_to_unstructured(a)
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])

>>> b = [Link]([(1, 2, 5), (4, 5, 7), (7, 8 ,11), (10, 11, 12)],
... dtype=[('x', 'i4'), ('y', 'f4'), ('z', 'f8')])
>>> [Link](rfn.structured_to_unstructured(b[['x', 'z']]), axis=-1)
array([ 3. , 5.5, 9. , 11. ])

[Link].unstructured_to_structured(arr, dtype=None, names=None,


align=False, copy=False, casting=’unsafe’)
Converts an n-D unstructured array into an (n-1)-D structured array.

122 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

The last dimension of the input array is converted into a structure, with number of field-elements equal to the size
of the last dimension of the input array. By default all output fields have the input array’s dtype, but an output
structured dtype with an equal number of fields-elements can be supplied instead.
Nested fields, as well as each element of any subarray fields, all count towards the number of field-elements.
Parameters

arr
[ndarray] Unstructured array or dtype to convert.
dtype
[dtype, optional] The structured dtype of the output array
names
[list of strings, optional] If dtype is not supplied, this specifies the field names for the output
dtype, in order. The field dtypes will be the same as the input array.
align
[boolean, optional] Whether to create an aligned memory layout.
copy
[bool, optional] See copy argument to [Link]. If true, always return a copy. If
false, and dtype requirements are satisfied, a view is returned.
casting
[{‘no’, ‘equiv’, ‘safe’, ‘same_kind’, ‘unsafe’}, optional] See casting argument of ndarray.
astype. Controls what kind of data casting may occur.

Returns

structured
[ndarray] Structured array with fewer dimensions.

Examples

>>> from [Link] import recfunctions as rfn


>>> dt = [Link]([('a', 'i4'), ('b', 'f4,u2'), ('c', 'f4', 2)])
>>> a = [Link](20).reshape((4,5))
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> rfn.unstructured_to_structured(a, dt)
array([( 0, ( 1., 2), [ 3., 4.]), ( 5, ( 6., 7), [ 8., 9.]),
(10, (11., 12), [13., 14.]), (15, (16., 17), [18., 19.])],
dtype=[('a', '<i4'), ('b', [('f0', '<f4'), ('f1', '<u2')]), ('c', '<f4', (2,
,→))])

4.7. Structured arrays 123


NumPy User Guide, Release 1.22.0

4.8 Writing custom array containers

Numpy’s dispatch mechanism, introduced in numpy version v1.16 is the recommended approach for writing custom
N-dimensional array containers that are compatible with the numpy API and provide custom implementations of numpy
functionality. Applications include dask arrays, an N-dimensional array distributed across multiple nodes, and cupy arrays,
an N-dimensional array on a GPU.
To get a feel for writing custom array containers, we’ll begin with a simple example that has rather narrow utility but
illustrates the concepts involved.

>>> import numpy as np


>>> class DiagonalArray:
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self, dtype=None):
... return self._i * [Link](self._N, dtype=dtype)

Our custom array can be instantiated like:

>>> arr = DiagonalArray(5, 1)


>>> arr
DiagonalArray(N=5, value=1)

We can convert to a numpy array using [Link] or [Link], which will call its __array__ method
to obtain a standard [Link].

>>> [Link](arr)
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])

If we operate on arr with a numpy function, numpy will again use the __array__ interface to convert it to an array
and then apply the function in the usual way.

>>> [Link](arr, 2)
array([[2., 0., 0., 0., 0.],
[0., 2., 0., 0., 0.],
[0., 0., 2., 0., 0.],
[0., 0., 0., 2., 0.],
[0., 0., 0., 0., 2.]])

Notice that the return type is a standard [Link].

>>> type([Link](arr, 2))


[Link]

How can we pass our custom array type through this function? Numpy allows a class to indicate that it would like to handle
computations in a custom-defined way through the interfaces __array_ufunc__ and __array_function__.
Let’s take one at a time, starting with _array_ufunc__. This method covers ufuncs, a class of functions that includes,
for example, [Link] and [Link].
The __array_ufunc__ receives:

124 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

• ufunc, a function like [Link]


• method, a string, differentiating between [Link](...) and variants like [Link].
outer, [Link], and so on. For the common case, [Link](...),
method == '__call__'.
• inputs, which could be a mixture of different types
• kwargs, keyword arguments passed to the function
For this example we will only handle the method __call__

>>> from numbers import Number


>>> class DiagonalArray:
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self, dtype=None):
... return self._i * [Link](self._N, dtype=dtype)
... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
... if method == '__call__':
... N = None
... scalars = []
... for input in inputs:
... if isinstance(input, Number):
... [Link](input)
... elif isinstance(input, self.__class__):
... [Link](input._i)
... if N is not None:
... if N != self._N:
... raise TypeError("inconsistent sizes")
... else:
... N = self._N
... else:
... return NotImplemented
... return self.__class__(N, ufunc(*scalars, **kwargs))
... else:
... return NotImplemented

Now our custom array type passes through numpy functions.

>>> arr = DiagonalArray(5, 1)


>>> [Link](arr, 3)
DiagonalArray(N=5, value=3)
>>> [Link](arr, 3)
DiagonalArray(N=5, value=4)
>>> [Link](arr)
DiagonalArray(N=5, value=0.8414709848078965)

At this point arr + 3 does not work.

>>> arr + 3
TypeError: unsupported operand type(s) for *: 'DiagonalArray' and 'int'

To support it, we need to define the Python interfaces __add__, __lt__, and so on to dispatch to the corresponding
ufunc. We can achieve this conveniently by inheriting from the mixin NDArrayOperatorsMixin.

4.8. Writing custom array containers 125


NumPy User Guide, Release 1.22.0

>>> import [Link]


>>> class DiagonalArray([Link]):
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self, dtype=None):
... return self._i * [Link](self._N, dtype=dtype)
... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
... if method == '__call__':
... N = None
... scalars = []
... for input in inputs:
... if isinstance(input, Number):
... [Link](input)
... elif isinstance(input, self.__class__):
... [Link](input._i)
... if N is not None:
... if N != self._N:
... raise TypeError("inconsistent sizes")
... else:
... N = self._N
... else:
... return NotImplemented
... return self.__class__(N, ufunc(*scalars, **kwargs))
... else:
... return NotImplemented

>>> arr = DiagonalArray(5, 1)


>>> arr + 3
DiagonalArray(N=5, value=4)
>>> arr > 0
DiagonalArray(N=5, value=True)

Now let’s tackle __array_function__. We’ll create dict that maps numpy functions to our custom variants.

>>> HANDLED_FUNCTIONS = {}
>>> class DiagonalArray([Link]):
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self, dtype=None):
... return self._i * [Link](self._N, dtype=dtype)
... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
... if method == '__call__':
... N = None
... scalars = []
... for input in inputs:
... # In this case we accept only scalar numbers or DiagonalArrays.
... if isinstance(input, Number):
... [Link](input)
... elif isinstance(input, self.__class__):
... [Link](input._i)
... if N is not None:
(continues on next page)

126 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

(continued from previous page)


... if N != self._N:
... raise TypeError("inconsistent sizes")
... else:
... N = self._N
... else:
... return NotImplemented
... return self.__class__(N, ufunc(*scalars, **kwargs))
... else:
... return NotImplemented
... def __array_function__(self, func, types, args, kwargs):
... if func not in HANDLED_FUNCTIONS:
... return NotImplemented
... # Note: this allows subclasses that don't override
... # __array_function__ to handle DiagonalArray objects.
... if not all(issubclass(t, self.__class__) for t in types):
... return NotImplemented
... return HANDLED_FUNCTIONS[func](*args, **kwargs)
...

A convenient pattern is to define a decorator implements that can be used to add functions to
HANDLED_FUNCTIONS.

>>> def implements(np_function):


... "Register an __array_function__ implementation for DiagonalArray objects."
... def decorator(func):
... HANDLED_FUNCTIONS[np_function] = func
... return func
... return decorator
...

Now we write implementations of numpy functions for DiagonalArray. For completeness, to support the usage
[Link]() add a method sum that calls [Link](self), and the same for mean.

>>> @implements([Link])
... def sum(arr):
... "Implementation of [Link] for DiagonalArray objects"
... return arr._i * arr._N
...
>>> @implements([Link])
... def mean(arr):
... "Implementation of [Link] for DiagonalArray objects"
... return arr._i / arr._N
...
>>> arr = DiagonalArray(5, 1)
>>> [Link](arr)
5
>>> [Link](arr)
0.2

If the user tries to use any numpy functions not included in HANDLED_FUNCTIONS, a TypeError will be raised by
numpy, indicating that this operation is not supported. For example, concatenating two DiagonalArrays does not
produce another diagonal array, so it is not supported.

>>> [Link]([arr, arr])


TypeError: no implementation found for '[Link]' on types that implement __
,→array_function__: [<class '__main__.DiagonalArray'>]

4.8. Writing custom array containers 127


NumPy User Guide, Release 1.22.0

Additionally, our implementations of sum and mean do not accept the optional arguments that numpy’s implementation
does.

>>> [Link](arr, axis=0)


TypeError: sum() got an unexpected keyword argument 'axis'

The user always has the option of converting to a normal [Link] with [Link] and using standard
numpy from there.

>>> [Link]([[Link](arr), [Link](arr)])


array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.],
[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])

Refer to the dask source code and cupy source code for more fully-worked examples of custom array containers.
See also NEP 18.

4.9 Subclassing ndarray

4.9.1 Introduction

Subclassing ndarray is relatively simple, but it has some complications compared to other Python objects. On this page
we explain the machinery that allows you to subclass ndarray, and the implications for implementing a subclass.

ndarrays and object creation

Subclassing ndarray is complicated by the fact that new instances of ndarray classes can come about in three different
ways. These are:
1. Explicit constructor call - as in MySubClass(params). This is the usual route to Python instance creation.
2. View casting - casting an existing ndarray as a given subclass
3. New from template - creating a new instance from a template instance. Examples include returning slices from a
subclassed array, creating return types from ufuncs, and copying arrays. See Creating new from template for more
details
The last two are characteristics of ndarrays - in order to support things like array slicing. The complications of subclassing
ndarray are due to the mechanisms numpy has to support these latter two routes of instance creation.

128 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

4.9.2 View casting

View casting is the standard ndarray mechanism by which you take an ndarray of any subclass, and return a view of the
array as another (specified) subclass:

>>> import numpy as np


>>> # create a completely useless ndarray subclass
>>> class C([Link]): pass
>>> # create a standard ndarray
>>> arr = [Link]((3,))
>>> # take a view of it, as our useless subclass
>>> c_arr = [Link](C)
>>> type(c_arr)
<class 'C'>

4.9.3 Creating new from template

New instances of an ndarray subclass can also come about by a very similar mechanism to View casting, when numpy
finds it needs to create a new instance from a template instance. The most obvious place this has to happen is when you
are taking slices of subclassed arrays. For example:

>>> v = c_arr[1:]
>>> type(v) # the view is of type 'C'
<class 'C'>
>>> v is c_arr # but it's a new instance
False

The slice is a view onto the original c_arr data. So, when we take a view from the ndarray, we return a new ndarray, of
the same class, that points to the data in the original.
There are other points in the use of ndarrays where we need such views, such as copying arrays (c_arr.copy()), cre-
ating ufunc output arrays (see also __array_wrap__ for ufuncs and other functions), and reducing methods (like c_arr.
mean()).

4.9.4 Relationship of view casting and new-from-template

These paths both use the same machinery. We make the distinction here, because they result in different input to your
methods. Specifically, View casting means you have created a new instance of your array type from any potential subclass
of ndarray. Creating new from template means you have created a new instance of your class from a pre-existing instance,
allowing you - for example - to copy across attributes that are particular to your subclass.

4.9.5 Implications for subclassing

If we subclass ndarray, we need to deal not only with explicit construction of our array type, but also View casting or
Creating new from template. NumPy has the machinery to do this, and it is this machinery that makes subclassing slightly
non-standard.
There are two aspects to the machinery that ndarray uses to support views and new-from-template in subclasses.
The first is the use of the ndarray.__new__ method for the main work of object initialization, rather then the more
usual __init__ method. The second is the use of the __array_finalize__ method to allow subclasses to clean
up after the creation of views and new instances from templates.

4.9. Subclassing ndarray 129


NumPy User Guide, Release 1.22.0

A brief Python primer on __new__ and __init__

__new__ is a standard Python method, and, if present, is called before __init__ when we create a class instance.
See the python __new__ documentation for more detail.
For example, consider the following Python code:

class C:
def __new__(cls, *args):
print('Cls in __new__:', cls)
print('Args in __new__:', args)
# The `object` type __new__ method takes a single argument.
return object.__new__(cls)

def __init__(self, *args):


print('type(self) in __init__:', type(self))
print('Args in __init__:', args)

meaning that we get:

>>> c = C('hello')
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
type(self) in __init__: <class 'C'>
Args in __init__: ('hello',)

When we call C('hello'), the __new__ method gets its own class as first argument, and the passed argument, which
is the string 'hello'. After python calls __new__, it usually (see below) calls our __init__ method, with the
output of __new__ as the first argument (now a class instance), and the passed arguments following.
As you can see, the object can be initialized in the __new__ method or the __init__ method, or both, and in fact
ndarray does not have an __init__ method, because all the initialization is done in the __new__ method.
Why use __new__ rather than just the usual __init__? Because in some cases, as for ndarray, we want to be able to
return an object of some other class. Consider the following:

class D(C):
def __new__(cls, *args):
print('D cls is:', cls)
print('D args in __new__:', args)
return C.__new__(C, *args)

def __init__(self, *args):


# we never get here
print('In D __init__')

meaning that:

>>> obj = D('hello')


D cls is: <class 'D'>
D args in __new__: ('hello',)
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
>>> type(obj)
<class 'C'>

The definition of C is the same as before, but for D, the __new__ method returns an instance of class C rather than D.
Note that the __init__ method of D does not get called. In general, when the __new__ method returns an object of
class other than the class in which it is defined, the __init__ method of that class is not called.

130 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

This is how subclasses of the ndarray class are able to return views that preserve the class type. When taking a view, the
standard ndarray machinery creates the new ndarray object with something like:
obj = ndarray.__new__(subtype, shape, ...

where subdtype is the subclass. Thus the returned view is of the same class as the subclass, rather than being of class
ndarray.
That solves the problem of returning views of the same type, but now we have a new problem. The machinery of ndarray
can set the class this way, in its standard methods for taking views, but the ndarray __new__ method knows nothing
of what we have done in our own __new__ method in order to set attributes, and so on. (Aside - why not call obj =
subdtype.__new__(... then? Because we may not have a __new__ method with the same call signature).

The role of __array_finalize__

__array_finalize__ is the mechanism that numpy provides to allow subclasses to handle the various ways that
new instances get created.
Remember that subclass instances can come about in these three ways:
1. explicit constructor call (obj = MySubClass(params)). This will call the usual sequence of
MySubClass.__new__ then (if it exists) MySubClass.__init__.
2. View casting
3. Creating new from template
Our MySubClass.__new__ method only gets called in the case of the explicit constructor call, so we can’t rely on
MySubClass.__new__ or MySubClass.__init__ to deal with the view casting and new-from-template. It turns
out that MySubClass.__array_finalize__ does get called for all three methods of object creation, so this is
where our object creation housekeeping usually goes.
• For the explicit constructor call, our subclass will need to create a new ndarray instance of its own class. In practice
this means that we, the authors of the code, will need to make a call to ndarray.__new__(MySubClass,
...), a class-hierarchy prepared call to super().__new__(cls, ...), or do view casting of an existing
array (see below)
• For view casting and new-from-template, the equivalent of ndarray.__new__(MySubClass,... is called,
at the C level.
The arguments that __array_finalize__ receives differ for the three methods of instance creation above.
The following code allows us to look at the call sequences and arguments:
import numpy as np

class C([Link]):
def __new__(cls, *args, **kwargs):
print('In __new__ with class %s' % cls)
return super().__new__(cls, *args, **kwargs)

def __init__(self, *args, **kwargs):


# in practice you probably will not need or want an __init__
# method for your subclass
print('In __init__ with class %s' % self.__class__)

def __array_finalize__(self, obj):


print('In array_finalize:')
print(' self type is %s' % type(self))
print(' obj type is %s' % type(obj))

4.9. Subclassing ndarray 131


NumPy User Guide, Release 1.22.0

Now:
>>> # Explicit constructor
>>> c = C((10,))
In __new__ with class <class 'C'>
In array_finalize:
self type is <class 'C'>
obj type is <type 'NoneType'>
In __init__ with class <class 'C'>
>>> # View casting
>>> a = [Link](10)
>>> cast_a = [Link](C)
In array_finalize:
self type is <class 'C'>
obj type is <type '[Link]'>
>>> # Slicing (example of new-from-template)
>>> cv = c[:1]
In array_finalize:
self type is <class 'C'>
obj type is <class 'C'>

The signature of __array_finalize__ is:


def __array_finalize__(self, obj):

One sees that the super call, which goes to ndarray.__new__, passes __array_finalize__ the new object,
of our own class (self) as well as the object from which the view has been taken (obj). As you can see from the output
above, the self is always a newly created instance of our subclass, and the type of obj differs for the three instance
creation methods:
• When called from the explicit constructor, obj is None
• When called from view casting, obj can be an instance of any subclass of ndarray, including our own.
• When called in new-from-template, obj is another instance of our own subclass, that we might use to update the
new self instance.
Because __array_finalize__ is the only method that always sees new instances being created, it is the sensible
place to fill in instance defaults for new object attributes, among other tasks.
This may be clearer with an example.

4.9.6 Simple example - adding an extra attribute to ndarray

import numpy as np

class InfoArray([Link]):

def __new__(subtype, shape, dtype=float, buffer=None, offset=0,


strides=None, order=None, info=None):
# Create the ndarray instance of our type, given the usual
# ndarray input arguments. This will call the standard
# ndarray constructor, but return an object of our type.
# It also triggers a call to InfoArray.__array_finalize__
obj = super().__new__(subtype, shape, dtype,
buffer, offset, strides, order)
# set the new 'info' attribute to the value passed
[Link] = info
(continues on next page)

132 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

(continued from previous page)


# Finally, we must return the newly created object:
return obj

def __array_finalize__(self, obj):


# ``self`` is a new object resulting from
# ndarray.__new__(InfoArray, ...), therefore it only has
# attributes that the ndarray.__new__ constructor gave it -
# i.e. those of a standard ndarray.
#
# We could have got to the ndarray.__new__ call in 3 ways:
# From an explicit constructor - e.g. InfoArray():
# obj is None
# (we're in the middle of the InfoArray.__new__
# constructor, and [Link] will be set when we return to
# InfoArray.__new__)
if obj is None: return
# From view casting - e.g [Link](InfoArray):
# obj is arr
# (type(obj) can be InfoArray)
# From new-from-template - e.g infoarr[:3]
# type(obj) is InfoArray
#
# Note that it is here, rather than in the __new__ method,
# that we set the default value for 'info', because this
# method sees all creation of default objects - with the
# InfoArray.__new__ constructor, but also with
# [Link](InfoArray).
[Link] = getattr(obj, 'info', None)
# We do not need to return anything

Using the object looks like this:

>>> obj = InfoArray(shape=(3,)) # explicit constructor


>>> type(obj)
<class 'InfoArray'>
>>> [Link] is None
True
>>> obj = InfoArray(shape=(3,), info='information')
>>> [Link]
'information'
>>> v = obj[1:] # new-from-template - here - slicing
>>> type(v)
<class 'InfoArray'>
>>> [Link]
'information'
>>> arr = [Link](10)
>>> cast_arr = [Link](InfoArray) # view casting
>>> type(cast_arr)
<class 'InfoArray'>
>>> cast_arr.info is None
True

This class isn’t very useful, because it has the same constructor as the bare ndarray object, including passing in buffers
and shapes and so on. We would probably prefer the constructor to be able to take an already formed ndarray from the
usual numpy calls to [Link] and return an object.

4.9. Subclassing ndarray 133


NumPy User Guide, Release 1.22.0

4.9.7 Slightly more realistic example - attribute added to existing array

Here is a class that takes a standard ndarray that already exists, casts as our type, and adds an extra attribute.

import numpy as np

class RealisticInfoArray([Link]):

def __new__(cls, input_array, info=None):


# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = [Link](input_array).view(cls)
# add the new attribute to the created instance
[Link] = info
# Finally, we must return the newly created object:
return obj

def __array_finalize__(self, obj):


# see InfoArray.__array_finalize__ for comments
if obj is None: return
[Link] = getattr(obj, 'info', None)

So:

>>> arr = [Link](5)


>>> obj = RealisticInfoArray(arr, info='information')
>>> type(obj)
<class 'RealisticInfoArray'>
>>> [Link]
'information'
>>> v = obj[1:]
>>> type(v)
<class 'RealisticInfoArray'>
>>> [Link]
'information'

4.9.8 __array_ufunc__ for ufuncs

New in version 1.13.


A subclass can override what happens when executing numpy ufuncs on it by overriding the default ndarray.
__array_ufunc__ method. This method is executed instead of the ufunc and should return either the result of
the operation, or NotImplemented if the operation requested is not implemented.
The signature of __array_ufunc__ is:

def __array_ufunc__(ufunc, method, *inputs, **kwargs):

- *ufunc* is the ufunc object that was called.


- *method* is a string indicating how the Ufunc was called, either
``"__call__"`` to indicate it was called directly, or one of its
:ref:`methods<[Link]>`: ``"reduce"``, ``"accumulate"``,
``"reduceat"``, ``"outer"``, or ``"at"``.
- *inputs* is a tuple of the input arguments to the ``ufunc``
- *kwargs* contains any optional or keyword arguments passed to the
function. This includes any ``out`` arguments, which are always
contained in a tuple.

134 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

A typical implementation would convert any inputs or outputs that are instances of one’s own class, pass everything on to
a superclass using super(), and finally return the results after possible back-conversion. An example, taken from the
test case test_ufunc_override_with_super in core/tests/test_umath.py, is the following.

input numpy as np

class A([Link]):
def __array_ufunc__(self, ufunc, method, *inputs, out=None, **kwargs):
args = []
in_no = []
for i, input_ in enumerate(inputs):
if isinstance(input_, A):
in_no.append(i)
[Link](input_.view([Link]))
else:
[Link](input_)

outputs = out
out_no = []
if outputs:
out_args = []
for j, output in enumerate(outputs):
if isinstance(output, A):
out_no.append(j)
out_args.append([Link]([Link]))
else:
out_args.append(output)
kwargs['out'] = tuple(out_args)
else:
outputs = (None,) * [Link]

info = {}
if in_no:
info['inputs'] = in_no
if out_no:
info['outputs'] = out_no

results = super().__array_ufunc__(ufunc, method, *args, **kwargs)


if results is NotImplemented:
return NotImplemented

if method == 'at':
if isinstance(inputs[0], A):
inputs[0].info = info
return

if [Link] == 1:
results = (results,)

results = tuple(([Link](result).view(A)
if output is None else output)
for result, output in zip(results, outputs))
if results and isinstance(results[0], A):
results[0].info = info

return results[0] if len(results) == 1 else results

So, this class does not actually do anything interesting: it just converts any instances of its own to regular ndarray (other-

4.9. Subclassing ndarray 135


NumPy User Guide, Release 1.22.0

wise, we’d get infinite recursion!), and adds an info dictionary that tells which inputs and outputs it converted. Hence,
e.g.,
>>> a = [Link](5.).view(A)
>>> b = [Link](a)
>>> [Link]
{'inputs': [0]}
>>> b = [Link]([Link](5.), out=(a,))
>>> [Link]
{'outputs': [0]}
>>> a = [Link](5.).view(A)
>>> b = [Link](1).view(A)
>>> c = a + b
>>> [Link]
{'inputs': [0, 1]}
>>> a += b
>>> [Link]
{'inputs': [0, 1], 'outputs': [0]}

Note that another approach would be to to use getattr(ufunc, methods)(*inputs, **kwargs) instead
of the super call. For this example, the result would be identical, but there is a difference if another operand also defines
__array_ufunc__. E.g., lets assume that we evalulate [Link](a, b), where b is an instance of another class B
that has an override. If you use super as in the example, ndarray.__array_ufunc__ will notice that b has an
override, which means it cannot evaluate the result itself. Thus, it will return NotImplemented and so will our class A.
Then, control will be passed over to b, which either knows how to deal with us and produces a result, or does not and
returns NotImplemented, raising a TypeError.
If instead, we replace our super call with getattr(ufunc, method), we effectively do [Link]([Link](np.
ndarray), b). Again, B.__array_ufunc__ will be called, but now it sees an ndarray as the other argu-
ment. Likely, it will know how to handle this, and return a new instance of the B class to us. Our example class is not
set up to handle this, but it might well be the best approach if, e.g., one were to re-implement MaskedArray using
__array_ufunc__.
As a final note: if the super route is suited to a given class, an advantage of using it is that it helps in constructing class
hierarchies. E.g., suppose that our other class B also used the super in its __array_ufunc__ implementation, and we
created a class C that depended on both, i.e., class C(A, B) (with, for simplicity, not another __array_ufunc__
override). Then any ufunc on an instance of C would pass on to A.__array_ufunc__, the super call in A would
go to B.__array_ufunc__, and the super call in B would go to ndarray.__array_ufunc__, thus allowing
A and B to collaborate.

4.9.9 __array_wrap__ for ufuncs and other functions

Prior to numpy 1.13, the behaviour of ufuncs could only be tuned using __array_wrap__ and
__array_prepare__. These two allowed one to change the output type of a ufunc, but, in contrast to
__array_ufunc__, did not allow one to make any changes to the inputs. It is hoped to eventually deprecate
these, but __array_wrap__ is also used by other numpy functions and methods, such as squeeze, so at the present
time is still needed for full functionality.
Conceptually, __array_wrap__ “wraps up the action” in the sense of allowing a subclass to set the type of the return
value and update attributes and metadata. Let’s show how this works with an example. First we return to the simpler
example subclass, but with a different name and some print statements:
import numpy as np

class MySubClass([Link]):

(continues on next page)

136 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

(continued from previous page)


def __new__(cls, input_array, info=None):
obj = [Link](input_array).view(cls)
[Link] = info
return obj

def __array_finalize__(self, obj):


print('In __array_finalize__:')
print(' self is %s' % repr(self))
print(' obj is %s' % repr(obj))
if obj is None: return
[Link] = getattr(obj, 'info', None)

def __array_wrap__(self, out_arr, context=None):


print('In __array_wrap__:')
print(' self is %s' % repr(self))
print(' arr is %s' % repr(out_arr))
# then just call the parent
return super().__array_wrap__(self, out_arr, context)

We run a ufunc on an instance of our new array:

>>> obj = MySubClass([Link](5), info='spam')


In __array_finalize__:
self is MySubClass([0, 1, 2, 3, 4])
obj is array([0, 1, 2, 3, 4])
>>> arr2 = [Link](5)+1
>>> ret = [Link](arr2, obj)
In __array_wrap__:
self is MySubClass([0, 1, 2, 3, 4])
arr is array([1, 3, 5, 7, 9])
In __array_finalize__:
self is MySubClass([1, 3, 5, 7, 9])
obj is MySubClass([0, 1, 2, 3, 4])
>>> ret
MySubClass([1, 3, 5, 7, 9])
>>> [Link]
'spam'

Note that the ufunc ([Link]) has called the __array_wrap__ method with arguments self as obj, and out_arr
as the (ndarray) result of the addition. In turn, the default __array_wrap__ (ndarray.__array_wrap__) has
cast the result to class MySubClass, and called __array_finalize__ - hence the copying of the info attribute.
This has all happened at the C level.
But, we could do anything we wanted:

class SillySubClass([Link]):

def __array_wrap__(self, arr, context=None):


return 'I lost your data'

>>> arr1 = [Link](5)


>>> obj = [Link](SillySubClass)
>>> arr2 = [Link](5)
>>> ret = [Link](obj, arr2)
>>> ret
'I lost your data'

4.9. Subclassing ndarray 137


NumPy User Guide, Release 1.22.0

So, by defining a specific __array_wrap__ method for our subclass, we can tweak the output from ufuncs. The
__array_wrap__ method requires self, then an argument - which is the result of the ufunc - and an optional pa-
rameter context. This parameter is returned by ufuncs as a 3-element tuple: (name of the ufunc, arguments of the ufunc,
domain of the ufunc), but is not set by other numpy functions. Though, as seen above, it is possible to do otherwise,
__array_wrap__ should return an instance of its containing class. See the masked array subclass for an implemen-
tation.
In addition to __array_wrap__, which is called on the way out of the ufunc, there is also an __array_prepare__
method which is called on the way into the ufunc, after the output arrays are created but before any computation has been
performed. The default implementation does nothing but pass through the array. __array_prepare__ should not
attempt to access the array data or resize the array, it is intended for setting the output array type, updating attributes
and metadata, and performing any checks based on the input that may be desired before computation begins. Like
__array_wrap__, __array_prepare__ must return an ndarray or subclass thereof or raise an error.

4.9.10 Extra gotchas - custom __del__ methods and [Link]

One of the problems that ndarray solves is keeping track of memory ownership of ndarrays and their views. Consider the
case where we have created an ndarray, arr and have taken a slice with v = arr[1:]. The two objects are looking
at the same memory. NumPy keeps track of where the data came from for a particular array or view, with the base
attribute:

>>> # A normal ndarray, that owns its own data


>>> arr = [Link]((4,))
>>> # In this case, base is None
>>> [Link] is None
True
>>> # We take a view
>>> v1 = arr[1:]
>>> # base now points to the array that it derived from
>>> [Link] is arr
True
>>> # Take a view of a view
>>> v2 = v1[1:]
>>> # base points to the original array that it was derived from
>>> [Link] is arr
True

In general, if the array owns its own memory, as for arr in this case, then [Link] will be None - there are some
exceptions to this - see the numpy book for more details.
The base attribute is useful in being able to tell whether we have a view or the original array. This in turn can be useful
if we need to know whether or not to do some specific cleanup when the subclassed array is deleted. For example, we
may only want to do the cleanup if the original array is deleted, but not the views. For an example of how this can work,
have a look at the memmap class in [Link].

138 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

4.9.11 Subclassing and Downstream Compatibility

When sub-classing ndarray or creating duck-types that mimic the ndarray interface, it is your responsibility to decide
how aligned your APIs will be with those of numpy. For convenience, many numpy functions that have a corresponding
ndarray method (e.g., sum, mean, take, reshape) work by checking if the first argument to a function has a
method of the same name. If it exists, the method is called instead of coercing the arguments to a numpy array.
For example, if you want your sub-class or duck-type to be compatible with numpy’s sum function, the method signature
for this object’s sum method should be the following:

def sum(self, axis=None, dtype=None, out=None, keepdims=False):


...

This is the exact same method signature for [Link], so now if a user calls [Link] on this object, numpy will call the
object’s own sum method and pass in these arguments enumerated above in the signature, and no errors will be raised
because the signatures are completely compatible with each other.
If, however, you decide to deviate from this signature and do something like this:

def sum(self, axis=None, dtype=None):


...

This object is no longer compatible with [Link] because if you call [Link], it will pass in unexpected arguments out
and keepdims, causing a TypeError to be raised.
If you wish to maintain compatibility with numpy and its subsequent versions (which might add new keyword arguments)
but do not want to surface all of numpy’s arguments, your function’s signature should accept **kwargs. For example:

def sum(self, axis=None, dtype=None, **unused_kwargs):


...

This object is now compatible with [Link] again because any extraneous arguments (i.e. keywords that are not axis
or dtype) will be hidden away in the **unused_kwargs parameter.

4.10 Universal functions (ufunc) basics

See also:
ufuncs
A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion,
supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a “vectorized” wrapper
for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs.
In NumPy, universal functions are instances of the [Link] class. Many of the built-in functions are implemented
in compiled C code. The basic ufuncs operate on scalars, but there is also a generalized kind for which the basic elements
are sub-arrays (vectors, matrices, etc.), and broadcasting is done over other dimensions. The simplest example is the
addition operator:

>>> [Link]([0,2,3,4]) + [Link]([1,1,-1,2])


array([1, 3, 2, 6])

One can also produce custom [Link] instances using the [Link] factory function.

4.10. Universal functions (ufunc) basics 139


NumPy User Guide, Release 1.22.0

4.10.1 Ufunc methods

All ufuncs have four methods. They can be found at [Link]. However, these methods only make sense on scalar
ufuncs that take two input arguments and return one output argument. Attempting to call these methods on other ufuncs
will cause a ValueError.
The reduce-like methods all take an axis keyword, a dtype keyword, and an out keyword, and the arrays must all have
dimension >= 1. The axis keyword specifies the axis of the array over which the reduction will take place (with negative
values counting backwards). Generally, it is an integer, though for [Link], it can also be a tuple of
int to reduce over several axes at once, or None, to reduce over all axes. For example:

>>> x = [Link](9).reshape(3,3)
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> [Link](x, 1)
array([ 3, 12, 21])
>>> [Link](x, (0, 1))
36

The dtype keyword allows you to manage a very common problem that arises when naively using [Link].
Sometimes you may have an array of a certain data type and wish to add up all of its elements, but the result does not fit
into the data type of the array. This commonly happens if you have an array of single-byte integers. The dtype keyword
allows you to alter the data type over which the reduction takes place (and therefore the type of the output). Thus, you
can ensure that the output is a data type with precision large enough to handle your output. The responsibility of altering
the reduce type is mostly up to you. There is one exception: if no dtype is given for a reduction on the “add” or “multiply”
operations, then if the input type is an integer (or Boolean) data-type and smaller than the size of the numpy.int_ data
type, it will be internally upcast to the int_ (or [Link]) data-type. In the previous example:

>>> [Link]
dtype('int64')
>>> [Link](x, dtype=float)
array([ 0., 28., 80.])

Finally, the out keyword allows you to provide an output array (for single-output ufuncs, which are currently the only
ones supported; for future extension, however, a tuple with a single argument can be passed in). If out is given, the dtype
argument is ignored. Considering x from the previous example:

>>> y = [Link](3, dtype=int)


>>> y
array([0, 0, 0])
>>> [Link](x, dtype=float, out=y)
array([ 0, 28, 80]) # dtype argument is ignored

Ufuncs also have a fifth method, [Link], that allows in place operations to be performed using advanced
indexing. No buffering is used on the dimensions where advanced indexing is used, so the advanced index can list an item
more than once and the operation will be performed on the result of the previous operation for that item.

140 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

4.10.2 Output type determination

The output of the ufunc (and its methods) is not necessarily an ndarray, if all input arguments are not ndarrays.
Indeed, if any input defines an __array_ufunc__ method, control will be passed completely to that function, i.e., the
ufunc is overridden.
If none of the inputs overrides the ufunc, then all output arrays will be passed to the __array_prepare__ and
__array_wrap__ methods of the input (besides ndarrays, and scalars) that defines it and has the highest
__array_priority__ of any other input to the universal function. The default __array_priority__ of the
ndarray is 0.0, and the default __array_priority__ of a subtype is 0.0. Matrices have __array_priority__
equal to 10.0.
All ufuncs can also take output arguments. If necessary, output will be cast to the data-type(s) of the provided output
array(s). If a class with an __array__ method is used for the output, results will be written to the object returned by
__array__. Then, if the class also has an __array_prepare__ method, it is called so metadata may be determined
based on the context of the ufunc (the context consisting of the ufunc itself, the arguments passed to the ufunc, and the
ufunc domain.) The array object returned by __array_prepare__ is passed to the ufunc for computation. Finally,
if the class also has an __array_wrap__ method, the returned ndarray result will be passed to that method just
before passing control back to the caller.

4.10.3 Broadcasting

See also:
Broadcasting basics
Each universal function takes array inputs and produces array outputs by performing the core function element-wise on
the inputs (where an element is generally a scalar, but can be a vector or higher-order sub-array for generalized ufuncs).
Standard broadcasting rules are applied so that inputs not sharing exactly the same shapes can still be usefully operated
on.
By these rules, if an input has a dimension size of 1 in its shape, the first data entry in that dimension will be used for
all calculations along that dimension. In other words, the stepping machinery of the ufunc will simply not step along that
dimension (the stride will be 0 for that dimension).

4.10.4 Type casting rules

Note: In NumPy 1.6.0, a type promotion API was created to encapsulate the mechanism for determining output types.
See the functions numpy.result_type, numpy.promote_types, and numpy.min_scalar_type for more
details.

At the core of every ufunc is a one-dimensional strided loop that implements the actual function for a specific type
combination. When a ufunc is created, it is given a static list of inner loops and a corresponding list of type signatures
over which the ufunc operates. The ufunc machinery uses this list to determine which inner loop to use for a particular
case. You can inspect the .types attribute for a particular ufunc to see which type combinations have a defined inner
loop and which output type they produce (character codes are used in said output for brevity).
Casting must be done on one or more of the inputs whenever the ufunc does not have a core loop implementation for
the input types provided. If an implementation for the input types cannot be found, then the algorithm searches for an
implementation with a type signature to which all of the inputs can be cast “safely.” The first one it finds in its internal
list of loops is selected and performed, after all necessary type casting. Recall that internal copies during ufuncs (even
for casting) are limited to the size of an internal buffer (which is user settable).

4.10. Universal functions (ufunc) basics 141


NumPy User Guide, Release 1.22.0

Note: Universal functions in NumPy are flexible enough to have mixed type signatures. Thus, for example, a universal
function could be defined that works with floating-point and integer values. See [Link] for an example.

By the above description, the casting rules are essentially implemented by the question of when a data type can be
cast “safely” to another data type. The answer to this question can be determined in Python with a function call:
can_cast(fromtype, totype). The example below shows the results of this call for the 24 internally supported
types on the author’s 64-bit system. You can generate this table for your system with the code given in the example.

Example

Code segment showing the “can cast safely” table for a 64-bit system. Generally the output depends on the system; your
system might result in a different table.
>>> mark = {False: ' -', True: ' Y'}
>>> def print_table(ntypes):
... print('X ' + ' '.join(ntypes))
... for row in ntypes:
... print(row, end='')
... for col in ntypes:
... print(mark[np.can_cast(row, col)], end='')
... print()
...
>>> print_table([Link]['All'])
X ? b h i l q p B H I L Q P e f d g F D G S U V O M m
? Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y - Y
b - Y Y Y Y Y Y - - - - - - Y Y Y Y Y Y Y Y Y Y Y - Y
h - - Y Y Y Y Y - - - - - - - Y Y Y Y Y Y Y Y Y Y - Y
i - - - Y Y Y Y - - - - - - - - Y Y - Y Y Y Y Y Y - Y
l - - - - Y Y Y - - - - - - - - Y Y - Y Y Y Y Y Y - Y
q - - - - Y Y Y - - - - - - - - Y Y - Y Y Y Y Y Y - Y
p - - - - Y Y Y - - - - - - - - Y Y - Y Y Y Y Y Y - Y
B - - Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y - Y
H - - - Y Y Y Y - Y Y Y Y Y - Y Y Y Y Y Y Y Y Y Y - Y
I - - - - Y Y Y - - Y Y Y Y - - Y Y - Y Y Y Y Y Y - Y
L - - - - - - - - - - Y Y Y - - Y Y - Y Y Y Y Y Y - -
Q - - - - - - - - - - Y Y Y - - Y Y - Y Y Y Y Y Y - -
P - - - - - - - - - - Y Y Y - - Y Y - Y Y Y Y Y Y - -
e - - - - - - - - - - - - - Y Y Y Y Y Y Y Y Y Y Y - -
f - - - - - - - - - - - - - - Y Y Y Y Y Y Y Y Y Y - -
d - - - - - - - - - - - - - - - Y Y - Y Y Y Y Y Y - -
g - - - - - - - - - - - - - - - - Y - - Y Y Y Y Y - -
F - - - - - - - - - - - - - - - - - Y Y Y Y Y Y Y - -
D - - - - - - - - - - - - - - - - - - Y Y Y Y Y Y - -
G - - - - - - - - - - - - - - - - - - - Y Y Y Y Y - -
S - - - - - - - - - - - - - - - - - - - - Y Y Y Y - -
U - - - - - - - - - - - - - - - - - - - - - Y Y Y - -
V - - - - - - - - - - - - - - - - - - - - - - Y Y - -
O - - - - - - - - - - - - - - - - - - - - - - - Y - -
M - - - - - - - - - - - - - - - - - - - - - - Y Y Y -
m - - - - - - - - - - - - - - - - - - - - - - Y Y - Y

You should note that, while included in the table for completeness, the ‘S’, ‘U’, and ‘V’ types cannot be operated on by
ufuncs. Also, note that on a 32-bit system the integer types may have different sizes, resulting in a slightly altered table.
Mixed scalar-array operations use a different set of casting rules that ensure that a scalar cannot “upcast” an array unless
the scalar is of a fundamentally different kind of data (i.e., under a different hierarchy in the data-type hierarchy) than

142 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

the array. This rule enables you to use scalar constants in your code (which, as Python types, are interpreted accordingly
in ufuncs) without worrying about whether the precision of the scalar constant will cause upcasting on your large (small
precision) array.

4.10.5 Use of internal buffers

Internally, buffers are used for misaligned data, swapped data, and data that has to be converted from one data type to
another. The size of internal buffers is settable on a per-thread basis. There can be up to 2(ninputs + noutputs ) buffers of the
specified size created to handle the data from all the inputs and outputs of a ufunc. The default size of a buffer is 10,000
elements. Whenever buffer-based calculation would be needed, but all input arrays are smaller than the buffer size, those
misbehaved or incorrectly-typed arrays will be copied before the calculation proceeds. Adjusting the size of the buffer
may therefore alter the speed at which ufunc calculations of various sorts are completed. A simple interface for setting
this variable is accessible using the function [Link].

4.10.6 Error handling

Universal functions can trip special floating-point status registers in your hardware (such as divide-by-zero). If available
on your platform, these registers will be regularly checked during calculation. Error handling is controlled on a per-thread
basis, and can be configured using the functions [Link] and [Link].

4.10.7 Overriding ufunc behavior

Classes (including ndarray subclasses) can override how ufuncs act on them by defining certain special methods. For
details, see [Link].

4.11 Copies and views

When operating on NumPy arrays, it is possible to access the internal data buffer directly using a view without copying
data around. This ensures good performance but can also cause unwanted problems if the user is not aware of how this
works. Hence, it is important to know the difference between these two terms and to know which operations return copies
and which return views.
The NumPy array is a data structure consisting of two parts: the contiguous data buffer with the actual data elements
and the metadata that contains information about the data buffer. The metadata includes data type, strides, and other
important information that helps manipulate the ndarray easily. See the Internal organization of NumPy arrays section
for a detailed look.

4.11.1 View

It is possible to access the array differently by just changing certain metadata like stride and dtype without changing the
data buffer. This creates a new way of looking at the data and these new arrays are called views. The data buffer remains
the same, so any changes made to a view reflects in the original copy. A view can be forced through the [Link]
method.

4.11. Copies and views 143


NumPy User Guide, Release 1.22.0

4.11.2 Copy

When a new array is created by duplicating the data buffer as well as the metadata, it is called a copy. Changes made to
the copy do not reflect on the original array. Making a copy is slower and memory-consuming but sometimes necessary.
A copy can be forced by using [Link].

4.11.3 Indexing operations

See also:
Indexing on ndarrays
Views are created when elements can be addressed with offsets and strides in the original array. Hence, basic indexing
always creates views. For example:

>>> x = [Link](10)
>>> x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> y = x[1:3] # creates a view
>>> y
array([1, 2])
>>> x[1:3] = [10, 11]
>>> x
array([ 0, 10, 11, 3, 4, 5, 6, 7, 8, 9])
>>> y
array([10, 11])

Here, y gets changed when x is changed because it is a view.


Advanced indexing, on the other hand, always creates copies. For example:

>>> x = [Link](9).reshape(3, 3)
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> y = x[[1, 2]]
>>> y
array([[3, 4, 5],
[6, 7, 8]])
>>> [Link] is None
True

Here, y is a copy, as signified by the base attribute. We can also confirm this by assigning new values to x[[1, 2]]
which in turn will not affect y at all:

>>> x[[1, 2]] = [[10, 11, 12], [13, 14, 15]]


>>> x
array([[ 0, 1, 2],
[10, 11, 12],
[13, 14, 15]])
>>> y
array([[3, 4, 5],
[6, 7, 8]])

It must be noted here that during the assignment of x[[1, 2]] no view or copy is created as the assignment happens
in-place.

144 4. NumPy fundamentals


NumPy User Guide, Release 1.22.0

4.11.4 Other operations

The [Link] function creates a view where possible or a copy otherwise. In most cases, the strides can be
modified to reshape the array with a view. However, in some cases where the array becomes non-contiguous (perhaps
after a [Link] operation), the reshaping cannot be done by modifying strides and requires a copy. In
these cases, we can raise an error by assigning the new shape to the shape attribute of the array. For example:

>>> x = [Link]((2, 3))


>>> y = x.T # makes the array non-contiguous
>>> y
array([[1., 1.],
[1., 1.],
[1., 1.]])
>>> z = [Link]()
>>> [Link] = 6
Traceback (most recent call last):
...
AttributeError: Incompatible shape for in-place modification. Use
`.reshape()` to make a copy with the desired shape.

Taking the example of another operation, ravel returns a contiguous flattened view of the array wherever possible. On
the other hand, [Link] always returns a flattened copy of the array. However, to guarantee a view in most
cases, [Link](-1) may be preferable.

4.11.5 How to tell if the array is a view or a copy

The base attribute of the ndarray makes it easy to tell if an array is a view or a copy. The base attribute of a view returns
the original array while it returns None for a copy.

>>> x = [Link](9)
>>> x
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> y = [Link](3, 3)
>>> y
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> [Link] # .reshape() creates a view
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> z = y[[2, 1]]
>>> z
array([[6, 7, 8],
[3, 4, 5]])
>>> [Link] is None # advanced indexing creates a copy
True

Note that the base attribute should not be used to determine if an ndarray object is new; only if it is a view or a copy of
another ndarray.

4.11. Copies and views 145


NumPy User Guide, Release 1.22.0

146 4. NumPy fundamentals


CHAPTER

FIVE

MISCELLANEOUS

5.1 IEEE 754 Floating Point Special Values

Special values defined in numpy: nan, inf,


NaNs can be used as a poor-man’s mask (if you don’t care what the original value was)
Note: cannot use equality to test NaNs. E.g.:

>>> myarr = [Link]([1., 0., [Link], 3.])


>>> [Link](myarr == [Link])
(array([], dtype=int64),)
>>> [Link] == [Link] # is always False! Use special numpy functions instead.
False
>>> myarr[myarr == [Link]] = 0. # doesn't work
>>> myarr
array([ 1., 0., NaN, 3.])
>>> myarr[[Link](myarr)] = 0. # use this instead find
>>> myarr
array([ 1., 0., 0., 3.])

Other related special value functions:

isinf(): True if value is inf


isfinite(): True if not nan or inf
nan_to_num(): Map nan to 0, inf to max float, -inf to min float

The following corresponds to the usual functions except that nans are excluded from the results:

nansum()
nanmax()
nanmin()
nanargmax()
nanargmin()

>>> x = [Link](10.)
>>> x[3] = [Link]
>>> [Link]()
nan
>>> [Link](x)
42.0

147
NumPy User Guide, Release 1.22.0

5.2 How numpy handles numerical exceptions

The default is to 'warn' for invalid, divide, and overflow and 'ignore' for underflow. But this can be
changed, and it can be set individually for different kinds of exceptions. The different behaviors are:
• ‘ignore’ : Take no action when the exception occurs.
• ‘warn’ : Print a RuntimeWarning (via the Python warnings module).
• ‘raise’ : Raise a FloatingPointError.
• ‘call’ : Call a function specified using the seterrcall function.
• ‘print’ : Print a warning directly to stdout.
• ‘log’ : Record error in a Log object specified by seterrcall.
These behaviors can be set for all kinds of errors or specific ones:
• all : apply to all numeric exceptions
• invalid : when NaNs are generated
• divide : divide by zero (for integers as well!)
• overflow : floating point overflows
• underflow : floating point underflows
Note that integer divide-by-zero is handled by the same machinery. These behaviors are set on a per-thread basis.

5.3 Examples

>>> oldsettings = [Link](all='warn')


>>> [Link](5,dtype=np.float32)/0.
invalid value encountered in divide
>>> j = [Link](under='ignore')
>>> [Link]([1.e-100])**10
>>> j = [Link](invalid='raise')
>>> [Link]([Link]([-1.]))
FloatingPointError: invalid value encountered in sqrt
>>> def errorhandler(errstr, errflag):
... print("saw stupid error!")
>>> [Link](errorhandler)
<function err_handler at 0x...>
>>> j = [Link](all='call')
>>> [Link](5, dtype=np.int32)/0
FloatingPointError: invalid value encountered in divide
saw stupid error!
>>> j = [Link](**oldsettings) # restore previous
... # error-handling settings

148 5. Miscellaneous
NumPy User Guide, Release 1.22.0

5.4 Interfacing to C

Only a survey of the choices. Little detail on how each works.


1) Bare metal, wrap your own C-code manually.
• Plusses:
– Efficient
– No dependencies on other tools
• Minuses:
– Lots of learning overhead:
∗ need to learn basics of Python C API
∗ need to learn basics of numpy C API
∗ need to learn how to handle reference counting and love it.
– Reference counting often difficult to get right.
∗ getting it wrong leads to memory leaks, and worse, segfaults
– API will change for Python 3.0!
2) Cython
• Plusses:
– avoid learning C API’s
– no dealing with reference counting
– can code in pseudo python and generate C code
– can also interface to existing C code
– should shield you from changes to Python C api
– has become the de-facto standard within the scientific Python community
– fast indexing support for arrays
• Minuses:
– Can write code in non-standard form which may become obsolete
– Not as flexible as manual wrapping
3) ctypes
• Plusses:
– part of Python standard library
– good for interfacing to existing shareable libraries, particularly Windows DLLs
– avoids API/reference counting issues
– good numpy support: arrays have all these in their ctypes attribute:

[Link]
[Link].data_as
[Link]
[Link].shape_as
(continues on next page)

5.4. Interfacing to C 149


NumPy User Guide, Release 1.22.0

(continued from previous page)


[Link]
[Link].strides_as

• Minuses:
– can’t use for writing code to be turned into C extensions, only a wrapper tool.
4) SWIG (automatic wrapper generator)
• Plusses:
– around a long time
– multiple scripting language support
– C++ support
– Good for wrapping large (many functions) existing C libraries
• Minuses:
– generates lots of code between Python and the C code
– can cause performance problems that are nearly impossible to optimize out
– interface files can be hard to write
– doesn’t necessarily avoid reference counting issues or needing to know API’s
5) [Link]
• Plusses:
– can turn many numpy expressions into C code
– dynamic compiling and loading of generated C code
– can embed pure C code in Python module and have weave extract, generate interfaces and compile, etc.
• Minuses:
– Future very uncertain: it’s the only part of Scipy not ported to Python 3 and is effectively deprecated in favor
of Cython.
6) Psyco
• Plusses:
– Turns pure python into efficient machine code through jit-like optimizations
– very fast when it optimizes well
• Minuses:
– Only on intel (windows?)
– Doesn’t do much for numpy?

150 5. Miscellaneous
NumPy User Guide, Release 1.22.0

5.5 Interfacing to Fortran:

The clear choice to wrap Fortran code is f2py.


Pyfort is an older alternative, but not supported any longer. Fwrap is a newer project that looked promising but isn’t being
developed any longer.

5.6 Interfacing to C++:

1) Cython
2) CXX
3) [Link]
4) SWIG
5) SIP (used mainly in PyQT)

5.5. Interfacing to Fortran: 151


NumPy User Guide, Release 1.22.0

152 5. Miscellaneous
CHAPTER

SIX

NUMPY FOR MATLAB USERS

6.1 Introduction

MATLAB® and NumPy have a lot in common, but NumPy was created to work with Python, not to be a MATLAB
clone. This guide will help MATLAB users get started with NumPy.

6.2 Some key differences

In MATLAB, the basic type, even for scalars, In NumPy, the basic type is a multidimensional array. Array as-
is a multidimensional array. Array assign- signments in NumPy are usually stored as n-dimensional arrays with
ments in MATLAB are stored as 2D arrays the minimum type required to hold the objects in sequence, unless
of double precision floating point numbers, you specify the number of dimensions and type. NumPy performs
unless you specify the number of dimensions operations element-by-element, so multiplying 2D arrays with * is
and type. Operations on the 2D instances of not a matrix multiplication – it’s an element-by-element multiplica-
these arrays are modeled on matrix opera- tion. (The @ operator, available since Python 3.5, can be used for
tions in linear algebra. conventional matrix multiplication.)
MATLAB numbers indices from 1; a(1) is NumPy, like Python, numbers indices from 0; a[0] is the first ele-
the first element. See note INDEXING ment.
MATLAB’s scripting language was created NumPy is based on Python, a general-purpose language. The advan-
for linear algebra so the syntax for some tage to NumPy is access to Python libraries including: SciPy, Mat-
array manipulations is more compact than plotlib, Pandas, OpenCV, and more. In addition, Python is often em-
NumPy’s. On the other hand, the API for bedded as a scripting language in other software, allowing NumPy to
adding GUIs and creating full-fledged appli- be used there too.
cations is more or less an afterthought.
MATLAB array slicing uses pass-by-value NumPy array slicing uses pass-by-reference, that does not copy the
semantics, with a lazy copy-on-write scheme arguments. Slicing operations are views into an array.
to prevent creating copies until they are
needed. Slicing operations copy parts of the
array.

153
NumPy User Guide, Release 1.22.0

6.3 Rough equivalents

The table below gives rough equivalents for some common MATLAB expressions. These are similar expressions, not
equivalents. For details, see the documentation.
In the table below, it is assumed that you have executed the following commands in Python:

import numpy as np
from scipy import io, integrate, linalg, signal
from [Link] import eigs

Also assume below that if the Notes talk about “matrix” that the arguments are two-dimensional entities.

154 6. NumPy for MATLAB users


NumPy User Guide, Release 1.22.0

6.3.1 General purpose equivalents

MATLAB NumPy Notes


help func info(func) or help(func) or get help on the function func
func? (in IPython)
which func see note HELP find out where func is defined
type func [Link](func) or func?? print source for func (if not a native
(in IPython) function)
% comment # comment comment a line of code with the text
comment
use a for-loop to print the numbers 1,
for i=1:3 for i in range(1, 4): 2, and 3 using range
fprintf('%i\n',i) print(i)
end

a && b a and b short-circuiting logical AND opera-


tor (Python native operator); scalar
arguments only
a || b a or b short-circuiting logical OR operator
(Python native operator); scalar argu-
ments only
The boolean objects in Python are
>> 4 == 4 >>> 4 == 4 True and False, as opposed to
ans = 1 True MATLAB logical types of 1 and 0.
>> 4 == 5 >>> 4 == 5
ans = 0 False

create an if-else statement to check if


a=4 a = 4 a is 4 or 5 and print result
if a==4 if a == 4:
fprintf('a = 4\n') print('a = 4')
elseif a==5 elif a == 5:
fprintf('a = 5\n') print('a = 5')
end

1*i, 1*j, 1i, 1j 1j complex numbers


eps [Link](float).eps or np. Upper bound to relative error due
spacing(1) to rounding in 64-bit floating point
arithmetic.
load [Link] [Link]('[Link]') Load MATLAB variables saved
to the file [Link]. (Note:
When saving arrays to data.
mat in MATLAB/Octave, use a
recent binary format. [Link].
loadmat will create a dictionary
with the saved arrays and further
information.)
ode45 integrate.solve_ivp(f) integrate an ODE with Runge-Kutta
4,5
ode15s integrate.solve_ivp(f, integrate an ODE with BDF method
method='BDF')

6.3. Rough equivalents 155


NumPy User Guide, Release 1.22.0

6.3.2 Linear algebra equivalents

MATLAB NumPy Notes


ndims(a) [Link](a) or [Link] number of dimensions of array a
numel(a) [Link](a) or [Link] number of elements of array a
size(a) [Link](a) or [Link] “size” of array a
size(a,n) [Link][n-1] get the number of elements of the
n-th dimension of array a. (Note
that MATLAB uses 1 based indexing
while Python uses 0 based indexing,
See note INDEXING)
[ 1 2 3; 4 5 6 ] [Link]([[1. ,2. ,3.], define a 2x3 2D array
[4. ,5. ,6.]])
[ a b; c d ] [Link]([[a, b], [c, construct a matrix from blocks a, b,
d]]) c, and d
a(end) a[-1] access last element in MATLAB vec-
tor (1xn or nx1) or 1D NumPy array
a (length n)
a(2,5) a[1, 4] access element in second row, fifth
column in 2D array a
a(2,:) a[1] or a[1, :] entire second row of 2D array a
a(1:5,:) a[0:5] or a[:5] or a[0:5, :] first 5 rows of 2D array a
a(end-4:end,:) a[-5:] last 5 rows of 2D array a
a(1:3,5:9) a[0:3, 4:9] The first through third rows and fifth
through ninth columns of a 2D array,
a.
a([2,4,5],[1,3]) a[np.ix_([1, 3, 4], [0, rows 2,4 and 5 and columns 1 and
2])] 3. This allows the matrix to be mod-
ified, and doesn’t require a regular
slice.
a([Link],:) a[Link],:] every other row of a, starting with the
third and going to the twenty-first
a([Link]nd,:) a[ ::2,:] every other row of a, starting with the
first
a(end:-1:1,:) or a[::-1,:] a with rows in reverse order
flipud(a)
a([1:end 1],:) a[np.r_[:len(a),0]] a with copy of the first row appended
to the end
a.' [Link]() or a.T transpose of a
a' [Link]().transpose() or a. conjugate transpose of a
conj().T
a * b a @ b matrix multiply
a .* b a * b element-wise multiply
a./b a/b element-wise divide
a.^3 a**3 element-wise exponentiation
(a > 0.5) (a > 0.5) matrix whose i,jth element is (a_ij
> 0.5). The MATLAB result is
an array of logical values 0 and 1.
The NumPy result is an array of the
boolean values False and True.
find(a > 0.5) [Link](a > 0.5) find the indices where (a > 0.5)
continues on next page

156 6. NumPy for MATLAB users


NumPy User Guide, Release 1.22.0

Table 1 – continued from previous page


MATLAB NumPy Notes
a(:,find(v > 0.5)) a[:,[Link](v > 0. extract the columns of a where vector
5)[0]] v > 0.5
a(:,find(v>0.5)) a[:, v.T > 0.5] extract the columns of a where col-
umn vector v > 0.5
a(a<0.5)=0 a[a < 0.5]=0 a with elements less than 0.5 zeroed
out
a .* (a>0.5) a * (a > 0.5) a with elements less than 0.5 zeroed
out
a(:) = 3 a[:] = 3 set all values to the same scalar value
y=x y = [Link]() NumPy assigns by reference
y=x(2,:) y = x[1, :].copy() NumPy slices are by reference
y=x(:) y = [Link]() turn array into vector (note that this
forces a copy). To obtain the same
data ordering as in MATLAB, use x.
flatten('F').
1:10 [Link](1., 11.) or create an increasing vector (see note
np.r_[1.:11.] or np. RANGES)
r_[Link]j]
0:9 [Link](10.) or np. create an increasing vector (see note
r_[:10.] or np.r_[:9:10j] RANGES)
[1:10]' [Link](1.,11.)[:, create a column vector
[Link]]
zeros(3,4) [Link]((3, 4)) 3x4 two-dimensional array full of 64-
bit floating point zeros
zeros(3,4,5) [Link]((3, 4, 5)) 3x4x5 three-dimensional array full of
64-bit floating point zeros
ones(3,4) [Link]((3, 4)) 3x4 two-dimensional array full of 64-
bit floating point ones
eye(3) [Link](3) 3x3 identity matrix
diag(a) [Link](a) returns a vector of the diagonal ele-
ments of 2D array, a
diag(v,0) [Link](v, 0) returns a square diagonal matrix
whose nonzero values are the ele-
ments of vector, v
generate a random 3x4 array with de-
rng(42,'twister') from [Link] import␣ fault random number generator and
rand(3,4) ,→default_rng seed = 42
rng = default_rng(42)
[Link](3, 4)
or older version: random.
rand((3, 4))
linspace(1,3,4) [Link](1,3,4) 4 equally spaced samples between 1
and 3, inclusive
[x,y]=meshgrid(0:8,0:5) [Link][0:9.,0:6.] or two 2D arrays: one of x values, the
[Link](r_[0:9.], other of y values
r_[0:6.]
ogrid[0:9.,0:6.] or np. the best way to eval functions on a
ix_(np.r_[0:9.],np. grid
r_[0:6.]
continues on next page

6.3. Rough equivalents 157


NumPy User Guide, Release 1.22.0

Table 1 – continued from previous page


MATLAB NumPy Notes
[x,y]=meshgrid([1,2,4], [Link]([1,2,4],[2,
[2,4,5]) 4,5])
ix_([1,2,4],[2,4,5]) the best way to eval functions on a
grid
repmat(a, m, n) [Link](a, (m, n)) create m by n copies of a
[a b] [Link]((a,b), concatenate columns of a and b
1) or [Link]((a,b)) or
np.column_stack((a,b)) or
np.c_[a,b]
[a; b] [Link]((a,b)) concatenate rows of a and b
or [Link]((a,b)) or
np.r_[a,b]
max(max(a)) [Link]() or [Link](a) maximum element of a (with
ndims(a)<=2 for MATLAB, if there
are NaN’s, nanmax will ignore
these and return largest value)
max(a) [Link](0) maximum element of each column of
array a
max(a,[],2) [Link](1) maximum element of each row of ar-
ray a
max(a,b) [Link](a, b) compares a and b element-wise, and
returns the maximum value from
each pair
norm(v) [Link](v @ v) or np. L2 norm of vector v
[Link](v)
a & b logical_and(a,b) element-by-element AND operator
(NumPy ufunc) See note LOGICOPS
a | b np.logical_or(a,b) element-by-element OR operator
(NumPy ufunc) See note LOGICOPS
bitand(a,b) a & b bitwise AND operator (Python native
and NumPy ufunc)
bitor(a,b) a | b bitwise OR operator (Python native
and NumPy ufunc)
inv(a) [Link](a) inverse of square 2D array a
pinv(a) [Link](a) pseudo-inverse of 2D array a
rank(a) linalg.matrix_rank(a) matrix rank of a 2D array a
a\b [Link](a, b) if a is solution of a x = b for x
square; [Link](a, b)
otherwise
b/a Solve a.T x.T = b.T instead solution of x a = b for x
[U,S,V]=svd(a) U, S, Vh = linalg. singular value decomposition of a
svd(a), V = Vh.T
c=chol(a) where a==c'*c c = [Link](a) Cholesky factorization of a 2D ar-
where a == [email protected] ray (chol(a) in MATLAB returns
an upper triangular 2D array, but
cholesky returns a lower triangu-
lar 2D array)
[V,D]=eig(a) D,V = [Link](a) eigenvalues λ and eigenvectors v̄ of
a, where λv̄ = av̄
continues on next page

158 6. NumPy for MATLAB users


NumPy User Guide, Release 1.22.0

Table 1 – continued from previous page


MATLAB NumPy Notes
[V,D]=eig(a,b) D,V = [Link](a, b) eigenvalues λ and eigenvectors v̄ of
a, b where λbv̄ = av̄
[V,D]=eigs(a,3) D,V = eigs(a, k = 3) find the k=3 largest eigenvalues and
eigenvectors of 2D array, a
[Q,R,P]=qr(a,0) Q,R = [Link](a) QR decomposition
[L,U,P]=lu(a) where P,L,U = [Link](a) LU decomposition (note:
a==P'*L*U where a == P@L@U P(MATLAB) == trans-
pose(P(NumPy)))
conjgrad cg Conjugate gradients solver
fft(a) [Link](a) Fourier transform of a
ifft(a) [Link](a) inverse Fourier transform of a
sort(a) [Link](a) or a. sort each column of a 2D array, a
sort(axis=0)
sort(a, 2) [Link](a, axis = 1) or a. sort the each row of 2D array, a
sort(axis = 1)
[b,I]=sortrows(a,1) I = [Link](a[:, 0]); save the array a as array b with rows
b = a[I,:] sorted by the first column
x = Z\y x = [Link](Z, y) perform a linear regression of the
form Zx = y
decimate(x, q) [Link](x, np. downsample with low-pass filtering
ceil(len(x)/q))
unique(a) [Link](a) a vector of unique values in array a
squeeze(a) [Link]() remove singleton dimensions of array
a. Note that MATLAB will always
return arrays of 2D or higher while
NumPy will return arrays of 0D or
higher

6.4 Notes

Submatrix: Assignment to a submatrix can be done with lists of indices using the ix_ command. E.g., for 2D array a,
one might do: ind=[1, 3]; a[np.ix_(ind, ind)] += 100.
HELP: There is no direct equivalent of MATLAB’s which command, but the commands help and numpy.
source will usually list the filename where the function is located. Python also has an inspect module (do
import inspect) which provides a getfile that often works.
INDEXING: MATLAB uses one based indexing, so the initial element of a sequence has index 1. Python uses zero based
indexing, so the initial element of a sequence has index 0. Confusion and flamewars arise because each has advantages
and disadvantages. One based indexing is consistent with common human language usage, where the “first” element of a
sequence has index 1. Zero based indexing simplifies indexing. See also a text by [Link]. Edsger W. Dijkstra.
RANGES: In MATLAB, 0:5 can be used as both a range literal and a ‘slice’ index (inside parentheses); however, in
Python, constructs like 0:5 can only be used as a slice index (inside square brackets). Thus the somewhat quirky r_
object was created to allow NumPy to have a similarly terse range construction mechanism. Note that r_ is not called
like a function or a constructor, but rather indexed using square brackets, which allows the use of Python’s slice syntax in
the arguments.
LOGICOPS: & or | in NumPy is bitwise AND/OR, while in MATLAB & and | are logical AND/OR. The two can
appear to work the same, but there are important differences. If you would have used MATLAB’s & or | operators,
you should use the NumPy ufuncs logical_and/logical_or. The notable differences between MATLAB’s and

6.4. Notes 159


NumPy User Guide, Release 1.22.0

NumPy’s & and | operators are:


• Non-logical {0,1} inputs: NumPy’s output is the bitwise AND of the inputs. MATLAB treats any non-zero value
as 1 and returns the logical AND. For example (3 & 4) in NumPy is 0, while in MATLAB both 3 and 4 are
considered logical true and (3 & 4) returns 1.
• Precedence: NumPy’s & operator is higher precedence than logical operators like < and >; MATLAB’s is the
reverse.
If you know you have boolean arguments, you can get away with using NumPy’s bitwise operators, but be careful with
parentheses, like this: z = (x > 1) & (x < 2). The absence of NumPy operator forms of logical_and and
logical_or is an unfortunate consequence of Python’s design.
RESHAPE and LINEAR INDEXING: MATLAB always allows multi-dimensional arrays to be accessed using scalar
or linear indices, NumPy does not. Linear indices are common in MATLAB programs, e.g. find() on a matrix returns
them, whereas NumPy’s find behaves differently. When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then reshape back. As reshape (usually) produces views
onto the same storage, it should be possible to do this fairly efficiently. Note that the scan order used by reshape in NumPy
defaults to the ‘C’ order, whereas MATLAB uses the Fortran order. If you are simply converting to a linear sequence and
back this doesn’t matter. But if you are converting reshapes from MATLAB code which relies on the scan order, then this
MATLAB code: z = reshape(x,3,4); should become z = [Link](3,4,order='F').copy() in
NumPy.

6.5 ‘array’ or ‘matrix’? Which should I use?

Historically, NumPy has provided a special matrix type, [Link], which is a subclass of ndarray which makes binary
operations linear algebra operations. You may see it used in some existing code instead of [Link]. So, which one to
use?

6.5.1 Short answer

Use arrays.
• They support multidimensional array algebra that is supported in MATLAB
• They are the standard vector/matrix/tensor type of NumPy. Many NumPy functions return arrays, not matrices.
• There is a clear distinction between element-wise operations and linear algebra operations.
• You can have standard vectors or row/column vectors if you like.
Until Python 3.5 the only disadvantage of using the array type was that you had to use dot instead of * to multiply (reduce)
two tensors (scalar product, matrix vector multiplication etc.). Since Python 3.5 you can use the matrix multiplication @
operator.
Given the above, we intend to deprecate matrix eventually.

160 6. NumPy for MATLAB users


NumPy User Guide, Release 1.22.0

6.5.2 Long answer

NumPy contains both an array class and a matrix class. The array class is intended to be a general-purpose
n-dimensional array for many kinds of numerical computing, while matrix is intended to facilitate linear algebra com-
putations specifically. In practice there are only a handful of key differences between the two.
• Operators * and @, functions dot(), and multiply():
– For array, “*“ means element-wise multiplication, while “@“ means matrix multiplication; they have
associated functions multiply() and dot(). (Before Python 3.5, @ did not exist and one had to use
dot() for matrix multiplication).
– For matrix, “*“ means matrix multiplication, and for element-wise multiplication one has to use the
multiply() function.
• Handling of vectors (one-dimensional arrays)
– For array, the vector shapes 1xN, Nx1, and N are all different things. Operations like A[:,1] return a
one-dimensional array of shape N, not a two-dimensional array of shape Nx1. Transpose on a one-dimensional
array does nothing.
– For matrix, one-dimensional arrays are always upconverted to 1xN or Nx1 matrices (row or column
vectors). A[:,1] returns a two-dimensional matrix of shape Nx1.
• Handling of higher-dimensional arrays (ndim > 2)
– array objects can have number of dimensions > 2;
– matrix objects always have exactly two dimensions.
• Convenience attributes
– array has a .T attribute, which returns the transpose of the data.
– matrix also has .H, .I, and .A attributes, which return the conjugate transpose, inverse, and asarray()
of the matrix, respectively.
• Convenience constructor
– The array constructor takes (nested) Python sequences as initializers. As in, array([[1,2,3],
[4,5,6]]).
– The matrix constructor additionally takes a convenient string initializer. As in matrix("[1 2 3;
4 5 6]").
There are pros and cons to using both:
• array
– :) Element-wise multiplication is easy: A*B.
– :( You have to remember that matrix multiplication has its own operator, @.
– :) You can treat one-dimensional arrays as either row or column vectors. A @ v treats v as a column vector,
while v @ A treats v as a row vector. This can save you having to type a lot of transposes.
– :) array is the “default” NumPy type, so it gets the most testing, and is the type most likely to be returned
by 3rd party code that uses NumPy.
– :) Is quite at home handling data of any number of dimensions.
– :) Closer in semantics to tensor algebra, if you are familiar with that.
– :) All operations (*, /, +, - etc.) are element-wise.
– :( Sparse matrices from [Link] do not interact as well with arrays.

6.5. ‘array’ or ‘matrix’? Which should I use? 161


NumPy User Guide, Release 1.22.0

• matrix
– :\\ Behavior is more like that of MATLAB matrices.
– <:( Maximum of two-dimensional. To hold three-dimensional data you need array or perhaps a Python
list of matrix.
– <:( Minimum of two-dimensional. You cannot have vectors. They must be cast as single-column or single-
row matrices.
– <:( Since array is the default in NumPy, some functions may return an array even if you give them a
matrix as an argument. This shouldn’t happen with NumPy functions (if it does it’s a bug), but 3rd party
code based on NumPy may not honor type preservation like NumPy does.
– :) A*B is matrix multiplication, so it looks just like you write it in linear algebra (For Python >= 3.5 plain
arrays have the same convenience with the @ operator).
– <:( Element-wise multiplication requires calling a function, multiply(A,B).
– <:( The use of operator overloading is a bit illogical: * does not work element-wise but / does.
– Interaction with [Link] is a bit cleaner.
The array is thus much more advisable to use. Indeed, we intend to deprecate matrix eventually.

6.6 Customizing your environment

In MATLAB the main tool available to you for customizing the environment is to modify the search path with the locations
of your favorite functions. You can put such customizations into a startup script that MATLAB will run on startup.
NumPy, or rather Python, has similar facilities.
• To modify your Python search path to include the locations of your own modules, define the PYTHONPATH envi-
ronment variable.
• To have a particular script file executed when the interactive Python interpreter is started, define the
PYTHONSTARTUP environment variable to contain the name of your startup script.
Unlike MATLAB, where anything on your path can be called immediately, with Python you need to first do an ‘import’
statement to make functions in a particular file accessible.
For example you might make a startup script that looks like this (Note: this is just an example, not a statement of “best
practices”):

# Make all numpy available via shorter 'np' prefix


import numpy as np
#
# Make the SciPy linear algebra functions available as [Link]()
# e.g. [Link], [Link] (for general l*B@u==A@u solution)
from scipy import linalg
#
# Define a Hermitian function
def hermitian(A, **kwargs):
return [Link](A,**kwargs).T
# Make a shortcut for hermitian:
# hermitian(A) --> H(A)
H = hermitian

To use the deprecated matrix and other matlib functions:

162 6. NumPy for MATLAB users


NumPy User Guide, Release 1.22.0

# Make all matlib functions accessible at the top level via [Link]()
import [Link] as M
# Make some matlib functions accessible directly at the top level via, e.g. rand(3,3)
from [Link] import matrix,rand,zeros,ones,empty,eye

6.7 Links

Another somewhat outdated MATLAB/NumPy cross-reference can be found at [Link]


An extensive list of tools for scientific work with Python can be found in the topical software page.
See List of Python software: scripting for a list of software that use Python as a scripting language
MATLAB® and SimuLink® are registered trademarks of The MathWorks, Inc.

6.7. Links 163


NumPy User Guide, Release 1.22.0

164 6. NumPy for MATLAB users


CHAPTER

SEVEN

BUILDING FROM SOURCE

There are two options for building NumPy- building with Gitpod or locally from source. Your choice depends on your
operating system and familiarity with the command line.

7.1 Gitpod

Gitpod is an open-source platform that automatically creates the correct development environment right in your browser,
reducing the need to install local development environments and deal with incompatible dependencies.
If you are a Windows user, unfamiliar with using the command line or building NumPy for the first time, it is often faster
to build with Gitpod. Here are the in-depth instructions for building NumPy with building NumPy with Gitpod.

7.2 Building locally

Building locally on your machine gives you more granular control. If you are a MacOS or Linux user familiar with using
the command line, you can continue with building NumPy locally by following the instructions below.

7.3 Prerequisites

Building NumPy requires the following software installed:


1) Python 3.6.x or newer
Please note that the Python development headers also need to be installed, e.g., on Debian/Ubuntu one needs to
install both python3 and python3-dev. On Windows and macOS this is normally not an issue.
2) Compilers
Much of NumPy is written in C. You will need a C compiler that complies with the C99 standard.
While a FORTRAN 77 compiler is not necessary for building NumPy, it is needed to run the numpy.f2py tests.
These tests are skipped if the compiler is not auto-detected.
Note that NumPy is developed mainly using GNU compilers and tested on MSVC and Clang compilers. Com-
pilers from other vendors such as Intel, Absoft, Sun, NAG, Compaq, Vast, Portland, Lahey, HP, IBM are only
supported in the form of community feedback, and may not work out of the box. GCC 4.x (and later) compilers
are recommended. On ARM64 (aarch64) GCC 8.x (and later) are recommended.

165
NumPy User Guide, Release 1.22.0

3) Linear Algebra libraries


NumPy does not require any external linear algebra libraries to be installed. However, if these are available,
NumPy’s setup script can detect them and use them for building. A number of different LAPACK library setups
can be used, including optimized LAPACK libraries such as OpenBLAS or MKL. The choice and location of these
libraries as well as include paths and other such build options can be specified in a [Link] file located in the
NumPy root repository or a .[Link] file in your home directory. See the [Link]
example file included in the NumPy repository or sdist for documentation, and below for specifying search priority
from environmental variables.
4) Cython
For building NumPy, you’ll need a recent version of Cython.

7.4 Basic Installation

To install NumPy, run:

pip install .

To perform an in-place build that can be run from the source folder run:

python [Link] build_ext --inplace

Note: for build instructions to do development work on NumPy itself, see development-environment.

7.5 Testing

Make sure to test your builds. To ensure everything stays in shape, see if all tests pass:

$ python [Link] -v -m full

For detailed info on testing, see testing-builds.

7.5.1 Parallel builds

It’s possible to do a parallel build with:

python [Link] build -j 4 install --prefix $HOME/.local

This will compile numpy on 4 CPUs and install it into the specified prefix. to perform a parallel in-place build, run:

python [Link] build_ext --inplace -j 4

The number of build jobs can also be specified via the environment variable NPY_NUM_BUILD_JOBS.

166 7. Building from source


NumPy User Guide, Release 1.22.0

7.5.2 Choosing the fortran compiler

Compilers are auto-detected; building with a particular compiler can be done with --fcompiler. E.g. to select
gfortran:

python [Link] build --fcompiler=gnu95

For more information see:

python [Link] build --help-fcompiler

7.5.3 How to check the ABI of BLAS/LAPACK libraries

One relatively simple and reliable way to check for the compiler used to build a library is to use ldd on the library. If
[Link] is a dependency, this means that g77 has been used (note: g77 is no longer supported for building NumPy). If
[Link] is a dependency, gfortran has been used. If both are dependencies, this means both have been used, which
is almost always a very bad idea.

7.6 Accelerated BLAS/LAPACK libraries

NumPy searches for optimized linear algebra libraries such as BLAS and LAPACK. There are specific orders for searching
these libraries, as described below and in the [Link] file.

7.6.1 BLAS

Note that both BLAS and CBLAS interfaces are needed for a properly optimized build of NumPy.
The default order for the libraries are:
1. MKL
2. BLIS
3. OpenBLAS
4. ATLAS
5. BLAS (NetLIB)
The detection of BLAS libraries may be bypassed by defining the environment variable NPY_BLAS_LIBS , which should
contain the exact linker flags you want to use (interface is assumed to be Fortran 77). Also define NPY_CBLAS_LIBS
(even empty if CBLAS is contained in your BLAS library) to trigger use of CBLAS and avoid slow fallback code for
matrix calculations.
If you wish to build against OpenBLAS but you also have BLIS available one may predefine the order of searching via
the environment variable NPY_BLAS_ORDER which is a comma-separated list of the above names which is used to
determine what to search for, for instance:

NPY_BLAS_ORDER=ATLAS,blis,openblas,MKL python [Link] build

will prefer to use ATLAS, then BLIS, then OpenBLAS and as a last resort MKL. If neither of these exists the build will
fail (names are compared lower case).
Alternatively one may use ! or ^ to negate all items:

7.6. Accelerated BLAS/LAPACK libraries 167


NumPy User Guide, Release 1.22.0

NPY_BLAS_ORDER='^blas,atlas' python [Link] build

will allow using anything but NetLIB BLAS and ATLAS libraries, the order of the above list is retained.
One cannot mix negation and positives, nor have multiple negations, such cases will raise an error.

7.6.2 LAPACK

The default order for the libraries are:


1. MKL
2. OpenBLAS
3. libFLAME
4. ATLAS
5. LAPACK (NetLIB)
The detection of LAPACK libraries may be bypassed by defining the environment variable NPY_LAPACK_LIBS, which
should contain the exact linker flags you want to use (language is assumed to be Fortran 77).
If you wish to build against OpenBLAS but you also have MKL available one may predefine the order of searching via
the environment variable NPY_LAPACK_ORDER which is a comma-separated list of the above names, for instance:

NPY_LAPACK_ORDER=ATLAS,openblas,MKL python [Link] build

will prefer to use ATLAS, then OpenBLAS and as a last resort MKL. If neither of these exists the build will fail (names
are compared lower case).
Alternatively one may use ! or ^ to negate all items:

NPY_LAPACK_ORDER='^lapack' python [Link] build

will allow using anything but the NetLIB LAPACK library, the order of the above list is retained.
One cannot mix negation and positives, nor have multiple negations, such cases will raise an error.
Deprecated since version 1.20: The native libraries on macOS, provided by Accelerate, are not fit for use in NumPy since
they have bugs that cause wrong output under easily reproducible conditions. If the vendor fixes those bugs, the library
could be reinstated, but until then users compiling for themselves should use another linear algebra library or use the
built-in (but slower) default, see the next section.

7.6.3 Disabling ATLAS and other accelerated libraries

Usage of ATLAS and other accelerated libraries in NumPy can be disabled via:

NPY_BLAS_ORDER= NPY_LAPACK_ORDER= python [Link] build

or:

BLAS=None LAPACK=None ATLAS=None python [Link] build

168 7. Building from source


NumPy User Guide, Release 1.22.0

7.6.4 64-bit BLAS and LAPACK

You can tell Numpy to use 64-bit BLAS/LAPACK libraries by setting the environment variable:

NPY_USE_BLAS_ILP64=1

when building Numpy. The following 64-bit BLAS/LAPACK libraries are supported:
1. OpenBLAS ILP64 with 64_ symbol suffix (openblas64_)
2. OpenBLAS ILP64 without symbol suffix (openblas_ilp64)
The order in which they are preferred is determined by NPY_BLAS_ILP64_ORDER and
NPY_LAPACK_ILP64_ORDER environment variables. The default value is openblas64_,openblas_ilp64.

Note: Using non-symbol-suffixed 64-bit BLAS/LAPACK in a program that also uses 32-bit BLAS/LAPACK can cause
crashes under certain conditions (e.g. with embedded Python interpreters on Linux).
The 64-bit OpenBLAS with 64_ symbol suffix is obtained by compiling OpenBLAS with settings:

make INTERFACE64=1 SYMBOLSUFFIX=64_

The symbol suffix avoids the symbol name clashes between 32-bit and 64-bit BLAS/LAPACK libraries.

7.7 Supplying additional compiler flags

Additional compiler flags can be supplied by setting the OPT, FOPT (for Fortran), and CC environment variables. When
providing options that should improve the performance of the code ensure that you also set -DNDEBUG so that debugging
code is not executed.

7.7. Supplying additional compiler flags 169


NumPy User Guide, Release 1.22.0

170 7. Building from source


CHAPTER

EIGHT

USING NUMPY C-API

8.1 How to extend NumPy

That which is static and repetitive is boring. That which is dynamic


and random is confusing. In between lies art.
— John A. Locke

Science is a differential equation. Religion is a boundary condition.


— Alan Turing

8.1.1 Writing an extension module

While the ndarray object is designed to allow rapid computation in Python, it is also designed to be general-purpose and
satisfy a wide- variety of computational needs. As a result, if absolute speed is essential, there is no replacement for a
well-crafted, compiled loop specific to your application and hardware. This is one of the reasons that numpy includes
f2py so that an easy-to-use mechanisms for linking (simple) C/C++ and (arbitrary) Fortran code directly into Python are
available. You are encouraged to use and improve this mechanism. The purpose of this section is not to document this
tool but to document the more basic steps to writing an extension module that this tool depends on.
When an extension module is written, compiled, and installed to somewhere in the Python path ([Link]), the code can
then be imported into Python as if it were a standard python file. It will contain objects and methods that have been
defined and compiled in C code. The basic steps for doing this in Python are well-documented and you can find more
information in the documentation for Python itself available online at [Link] .
In addition to the Python C-API, there is a full and rich C-API for NumPy allowing sophisticated manipulations on a
C-level. However, for most applications, only a few API calls will typically be used. For example, if you need to just
extract a pointer to memory along with some shape information to pass to another calculation routine, then you will use
very different calls than if you are trying to create a new array-like type or add a new data type for ndarrays. This chapter
documents the API calls and macros that are most commonly used.

171
NumPy User Guide, Release 1.22.0

8.1.2 Required subroutine

There is exactly one function that must be defined in your C-code in order for Python to use it as an extension module.
The function must be called init{name} where {name} is the name of the module from Python. This function must be
declared so that it is visible to code outside of the routine. Besides adding the methods and constants you desire, this
subroutine must also contain calls like import_array() and/or import_ufunc() depending on which C-API is
needed. Forgetting to place these commands will show itself as an ugly segmentation fault (crash) as soon as any C-API
subroutine is actually called. It is actually possible to have multiple init{name} functions in a single file in which case
multiple modules will be defined by that file. However, there are some tricks to get that to work correctly and it is not
covered here.
A minimal init{name} method looks like:

PyMODINIT_FUNC
init{name}(void)
{
(void)Py_InitModule({name}, mymethods);
import_array();
}

The mymethods must be an array (usually statically declared) of PyMethodDef structures which contain method names,
actual C-functions, a variable indicating whether the method uses keyword arguments or not, and docstrings. These
are explained in the next section. If you want to add constants to the module, then you store the returned value from
Py_InitModule which is a module object. The most general way to add items to the module is to get the module dictionary
using PyModule_GetDict(module). With the module dictionary, you can add whatever you like to the module manually.
An easier way to add objects to the module is to use one of three additional Python C-API calls that do not require a
separate extraction of the module dictionary. These are documented in the Python documentation, but repeated here for
convenience:
int PyModule_AddObject(PyObject *module, char *name, PyObject *value)

int PyModule_AddIntConstant(PyObject *module, char *name, long value)

int PyModule_AddStringConstant(PyObject *module, char *name, char *value)


All three of these functions require the module object (the return value of Py_InitModule). The name is a string
that labels the value in the module. Depending on which function is called, the value argument is either a general
object (PyModule_AddObject steals a reference to it), an integer constant, or a string constant.

8.1.3 Defining functions

The second argument passed in to the Py_InitModule function is a structure that makes it easy to to define functions in
the module. In the example given above, the mymethods structure would have been defined earlier in the file (usually
right before the init{name} subroutine) to:

static PyMethodDef mymethods[] = {


{ nokeywordfunc,nokeyword_cfunc,
METH_VARARGS,
Doc string},
{ keywordfunc, keyword_cfunc,
METH_VARARGS|METH_KEYWORDS,
Doc string},
{NULL, NULL, 0, NULL} /* Sentinel */
}

172 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

Each entry in the mymethods array is a PyMethodDef structure containing 1) the Python name, 2) the C-function that
implements the function, 3) flags indicating whether or not keywords are accepted for this function, and 4) The docstring
for the function. Any number of functions may be defined for a single module by adding more entries to this table. The
last entry must be all NULL as shown to act as a sentinel. Python looks for this entry to know that all of the functions for
the module have been defined.
The last thing that must be done to finish the extension module is to actually write the code that performs the desired
functions. There are two kinds of functions: those that don’t accept keyword arguments, and those that do.

Functions without keyword arguments

Functions that don’t accept keyword arguments should be written as:

static PyObject*
nokeyword_cfunc (PyObject *dummy, PyObject *args)
{
/* convert Python arguments */
/* do function */
/* return something */
}

The dummy argument is not used in this context and can be safely ignored. The args argument contains all of the
arguments passed in to the function as a tuple. You can do anything you want at this point, but usually the easiest
way to manage the input arguments is to call PyArg_ParseTuple (args, format_string, addresses_to_C_variables…)
or PyArg_UnpackTuple (tuple, “name”, min, max, …). A good description of how to use the first function is
contained in the Python C-API reference manual under section 5.5 (Parsing arguments and building values). You should
pay particular attention to the “O&” format which uses converter functions to go between the Python object and the C
object. All of the other format functions can be (mostly) thought of as special cases of this general rule. There are several
converter functions defined in the NumPy C-API that may be of use. In particular, the PyArray_DescrConverter
function is very useful to support arbitrary data-type specification. This function transforms any valid data-type Python
object into a PyArray_Descr* object. Remember to pass in the address of the C-variables that should be filled in.
There are lots of examples of how to use PyArg_ParseTuple throughout the NumPy source code. The standard
usage is like this:

PyObject *input;
PyArray_Descr *dtype;
if (!PyArg_ParseTuple(args, "OO&", &input,
PyArray_DescrConverter,
&dtype)) return NULL;

It is important to keep in mind that you get a borrowed reference to the object when using the “O” format string. However,
the converter functions usually require some form of memory handling. In this example, if the conversion is successful,
dtype will hold a new reference to a PyArray_Descr* object, while input will hold a borrowed reference. Therefore,
if this conversion were mixed with another conversion (say to an integer) and the data-type conversion was successful but
the integer conversion failed, then you would need to release the reference count to the data-type object before returning.
A typical way to do this is to set dtype to NULL before calling PyArg_ParseTuple and then use Py_XDECREF on
dtype before returning.
After the input arguments are processed, the code that actually does the work is written (likely calling other functions as
needed). The final step of the C-function is to return something. If an error is encountered then NULL should be returned
(making sure an error has actually been set). If nothing should be returned then increment Py_None and return it. If a
single object should be returned then it is returned (ensuring that you own a reference to it first). If multiple objects should
be returned then you need to return a tuple. The Py_BuildValue (format_string, c_variables…) function makes it easy
to build tuples of Python objects from C variables. Pay special attention to the difference between ‘N’ and ‘O’ in the format
string or you can easily create memory leaks. The ‘O’ format string increments the reference count of the PyObject*

8.1. How to extend NumPy 173


NumPy User Guide, Release 1.22.0

C-variable it corresponds to, while the ‘N’ format string steals a reference to the corresponding PyObject* C-variable.
You should use ‘N’ if you have already created a reference for the object and just want to give that reference to the tuple.
You should use ‘O’ if you only have a borrowed reference to an object and need to create one to provide for the tuple.

Functions with keyword arguments

These functions are very similar to functions without keyword arguments. The only difference is that the function signature
is:
static PyObject*
keyword_cfunc (PyObject *dummy, PyObject *args, PyObject *kwds)
{
...
}

The kwds argument holds a Python dictionary whose keys are the names of the keyword arguments and whose values
are the corresponding keyword-argument values. This dictionary can be processed however you see fit. The easiest way
to handle it, however, is to replace the PyArg_ParseTuple (args, format_string, addresses…) function with a call to
PyArg_ParseTupleAndKeywords (args, kwds, format_string, char *kwlist[], addresses…). The kwlist parameter
to this function is a NULL -terminated array of strings providing the expected keyword arguments. There should be one
string for each entry in the format_string. Using this function will raise a TypeError if invalid keyword arguments are
passed in.
For more help on this function please see section 1.8 (Keyword Parameters for Extension Functions) of the Extending
and Embedding tutorial in the Python documentation.

Reference counting

The biggest difficulty when writing extension modules is reference counting. It is an important reason for the popularity
of f2py, weave, Cython, ctypes, etc…. If you mis-handle reference counts you can get problems from memory-leaks to
segmentation faults. The only strategy I know of to handle reference counts correctly is blood, sweat, and tears. First,
you force it into your head that every Python variable has a reference count. Then, you understand exactly what each
function does to the reference count of your objects, so that you can properly use DECREF and INCREF when you need
them. Reference counting can really test the amount of patience and diligence you have towards your programming craft.
Despite the grim depiction, most cases of reference counting are quite straightforward with the most common difficulty
being not using DECREF on objects before exiting early from a routine due to some error. In second place, is the common
error of not owning the reference on an object that is passed to a function or macro that is going to steal the reference (
e.g. PyTuple_SET_ITEM, and most functions that take PyArray_Descr objects).
Typically you get a new reference to a variable when it is created or is the return value of some function (there are
some prominent exceptions, however — such as getting an item out of a tuple or a dictionary). When you own the
reference, you are responsible to make sure that Py_DECREF (var) is called when the variable is no longer necessary
(and no other function has “stolen” its reference). Also, if you are passing a Python object to a function that will “steal”
the reference, then you need to make sure you own it (or use Py_INCREF to get your own reference). You will also
encounter the notion of borrowing a reference. A function that borrows a reference does not alter the reference count of
the object and does not expect to “hold on “to the reference. It’s just going to use the object temporarily. When you use
PyArg_ParseTuple or PyArg_UnpackTuple you receive a borrowed reference to the objects in the tuple and
should not alter their reference count inside your function. With practice, you can learn to get reference counting right,
but it can be frustrating at first.
One common source of reference-count errors is the Py_BuildValue function. Pay careful attention to the difference
between the ‘N’ format character and the ‘O’ format character. If you create a new object in your subroutine (such as
an output array), and you are passing it back in a tuple of return values, then you should most- likely use the ‘N’ format
character in Py_BuildValue. The ‘O’ character will increase the reference count by one. This will leave the caller with
two reference counts for a brand-new array. When the variable is deleted and the reference count decremented by one,

174 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

there will still be that extra reference count, and the array will never be deallocated. You will have a reference-counting
induced memory leak. Using the ‘N’ character will avoid this situation as it will return to the caller an object (inside the
tuple) with a single reference count.

8.1.4 Dealing with array objects

Most extension modules for NumPy will need to access the memory for an ndarray object (or one of it’s sub-classes). The
easiest way to do this doesn’t require you to know much about the internals of NumPy. The method is to
1. Ensure you are dealing with a well-behaved array (aligned, in machine byte-order and single-segment) of the correct
type and number of dimensions.
1. By converting it from some Python object using PyArray_FromAny or a macro built on it.
2. By constructing a new ndarray of your desired shape and type using PyArray_NewFromDescr or a sim-
pler macro or function based on it.
2. Get the shape of the array and a pointer to its actual data.
3. Pass the data and shape information on to a subroutine or other section of code that actually performs the compu-
tation.
4. If you are writing the algorithm, then I recommend that you use the stride information contained in the array to
access the elements of the array (the PyArray_GetPtr macros make this painless). Then, you can relax your
requirements so as not to force a single-segment array and the data-copying that might result.
Each of these sub-topics is covered in the following sub-sections.

Converting an arbitrary sequence object

The main routine for obtaining an array from any Python object that can be converted to an array is PyArray_FromAny.
This function is very flexible with many input arguments. Several macros make it easier to use the basic function.
PyArray_FROM_OTF is arguably the most useful of these macros for the most common uses. It allows you to convert
an arbitrary Python object to an array of a specific builtin data-type ( e.g. float), while specifying a particular set of
requirements ( e.g. contiguous, aligned, and writeable). The syntax is
PyArray_FROM_OTF
Return an ndarray from any Python object, obj, that can be converted to an array. The number of dimensions in
the returned array is determined by the object. The desired data-type of the returned array is provided in typenum
which should be one of the enumerated types. The requirements for the returned array can be any combination of
standard array flags. Each of these arguments is explained in more detail below. You receive a new reference to
the array on success. On failure, NULL is returned and an exception is set.
obj
The object can be any Python object convertible to an ndarray. If the object is already (a subclass of) the
ndarray that satisfies the requirements then a new reference is returned. Otherwise, a new array is con-
structed. The contents of obj are copied to the new array unless the array interface is used so that data does
not have to be copied. Objects that can be converted to an array include: 1) any nested sequence object, 2)
any object exposing the array interface, 3) any object with an __array__ method (which should return
an ndarray), and 4) any scalar object (becomes a zero-dimensional array). Sub-classes of the ndarray that
otherwise fit the requirements will be passed through. If you want to ensure a base-class ndarray, then use
NPY_ARRAY_ENSUREARRAY in the requirements flag. A copy is made only if necessary. If you want to
guarantee a copy, then pass in NPY_ARRAY_ENSURECOPY to the requirements flag.
typenum

8.1. How to extend NumPy 175


NumPy User Guide, Release 1.22.0

One of the enumerated types or NPY_NOTYPE if the data-type should be determined from the object itself.
The C-based names can be used:
NPY_BOOL, NPY_BYTE, NPY_UBYTE, NPY_SHORT, NPY_USHORT, NPY_INT,
NPY_UINT, NPY_LONG, NPY_ULONG, NPY_LONGLONG, NPY_ULONGLONG, NPY_DOUBLE,
NPY_LONGDOUBLE, NPY_CFLOAT, NPY_CDOUBLE, NPY_CLONGDOUBLE, NPY_OBJECT.
Alternatively, the bit-width names can be used as supported on the platform. For example:
NPY_INT8, NPY_INT16, NPY_INT32, NPY_INT64, NPY_UINT8, NPY_UINT16,
NPY_UINT32, NPY_UINT64, NPY_FLOAT32, NPY_FLOAT64, NPY_COMPLEX64,
NPY_COMPLEX128.
The object will be converted to the desired type only if it can be done without losing precision. Otherwise
NULL will be returned and an error raised. Use NPY_ARRAY_FORCECAST in the requirements flag to
override this behavior.
requirements
The memory model for an ndarray admits arbitrary strides in each dimension to advance to the next element
of the array. Often, however, you need to interface with code that expects a C-contiguous or a Fortran-
contiguous memory layout. In addition, an ndarray can be misaligned (the address of an element is not at an
integral multiple of the size of the element) which can cause your program to crash (or at least work more
slowly) if you try and dereference a pointer into the array data. Both of these problems can be solved by
converting the Python object into an array that is more “well-behaved” for your specific usage.
The requirements flag allows specification of what kind of array is acceptable. If the object passed in does not
satisfy this requirements then a copy is made so that the returned object will satisfy the requirements. these
ndarray can use a very generic pointer to memory. This flag allows specification of the desired properties of
the returned array object. All of the flags are explained in the detailed API chapter. The flags most commonly
needed are NPY_ARRAY_IN_ARRAY, NPY_OUT_ARRAY, and NPY_ARRAY_INOUT_ARRAY:
NPY_ARRAY_IN_ARRAY
This flag is useful for arrays that must be in C-contiguous order and aligned. These kinds of arrays are
usually input arrays for some algorithm.
NPY_ARRAY_OUT_ARRAY
This flag is useful to specify an array that is in C-contiguous order, is aligned, and can be written to as
well. Such an array is usually returned as output (although normally such output arrays are created from
scratch).
NPY_ARRAY_INOUT_ARRAY
This flag is useful to specify an array that will be used for both input and output.
PyArray_ResolveWritebackIfCopy must be called before Py_DECREF at the end of
the interface routine to write back the temporary data into the original array passed in. Use of the
NPY_ARRAY_WRITEBACKIFCOPY or NPY_ARRAY_UPDATEIFCOPY flags requires that the input
object is already an array (because other objects cannot be automatically updated in this fashion). If
an error occurs use PyArray_DiscardWritebackIfCopy (obj) on an array with these flags set.
This will set the underlying base array writable without causing the contents to be copied back into the
original array.
Other useful flags that can be OR’d as additional requirements are:
NPY_ARRAY_FORCECAST
Cast to the desired type, even if it can’t be done without losing information.
NPY_ARRAY_ENSURECOPY
Make sure the resulting array is a copy of the original.

176 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

NPY_ARRAY_ENSUREARRAY
Make sure the resulting object is an actual ndarray and not a sub- class.

Note: Whether or not an array is byte-swapped is determined by the data-type of the array. Native byte-order arrays
are always requested by PyArray_FROM_OTF and so there is no need for a NPY_ARRAY_NOTSWAPPED flag in the
requirements argument. There is also no way to get a byte-swapped array from this routine.

Creating a brand-new ndarray

Quite often, new arrays must be created from within extension-module code. Perhaps an output array is needed and you
don’t want the caller to have to supply it. Perhaps only a temporary array is needed to hold an intermediate calculation.
Whatever the need there are simple ways to get an ndarray object of whatever data-type is needed. The most general
function for doing this is PyArray_NewFromDescr. All array creation functions go through this heavily re-used
code. Because of its flexibility, it can be somewhat confusing to use. As a result, simpler forms exist that are easier to
use. These forms are part of the PyArray_SimpleNew family of functions, which simplify the interface by providing
default values for common use cases.

Getting at ndarray memory and accessing elements of the ndarray

If obj is an ndarray (PyArrayObject*), then the data-area of the ndarray is pointed to by the void* pointer
PyArray_DATA (obj) or the char* pointer PyArray_BYTES (obj). Remember that (in general) this data-area may
not be aligned according to the data-type, it may represent byte-swapped data, and/or it may not be writeable. If the data
area is aligned and in native byte-order, then how to get at a specific element of the array is determined only by the array
of npy_intp variables, PyArray_STRIDES (obj). In particular, this c-array of integers shows how many bytes must
be added to the current element pointer to get to the next element in each dimension. For arrays less than 4-dimensions
there are PyArray_GETPTR{k} (obj, …) macros where {k} is the integer 1, 2, 3, or 4 that make using the array
strides easier. The arguments …. represent {k} non- negative integer indices into the array. For example, suppose E is a
3-dimensional ndarray. A (void*) pointer to the element E[i,j,k] is obtained as PyArray_GETPTR3 (E, i, j, k).
As explained previously, C-style contiguous arrays and Fortran-style contiguous arrays have particular striding patterns.
Two array flags (NPY_ARRAY_C_CONTIGUOUS and NPY_ARRAY_F_CONTIGUOUS) indicate whether or not the
striding pattern of a particular array matches the C-style contiguous or Fortran-style contiguous or neither. Whether or
not the striding pattern matches a standard C or Fortran one can be tested Using PyArray_IS_C_CONTIGUOUS (obj)
and PyArray_ISFORTRAN (obj) respectively. Most third-party libraries expect contiguous arrays. But, often it is not
difficult to support general-purpose striding. I encourage you to use the striding information in your own code whenever
possible, and reserve single-segment requirements for wrapping third-party code. Using the striding information provided
with the ndarray rather than requiring a contiguous striding reduces copying that otherwise must be made.

8.1.5 Example

The following example shows how you might write a wrapper that accepts two input arguments (that will be converted to
an array) and an output argument (that must be an array). The function returns None and updates the output array. Note
the updated use of WRITEBACKIFCOPY semantics for NumPy v1.14 and above
static PyObject *
example_wrapper(PyObject *dummy, PyObject *args)
{
PyObject *arg1=NULL, *arg2=NULL, *out=NULL;
PyObject *arr1=NULL, *arr2=NULL, *oarr=NULL;

(continues on next page)

8.1. How to extend NumPy 177


NumPy User Guide, Release 1.22.0

(continued from previous page)


if (!PyArg_ParseTuple(args, "OOO!", &arg1, &arg2,
&PyArray_Type, &out)) return NULL;

arr1 = PyArray_FROM_OTF(arg1, NPY_DOUBLE, NPY_ARRAY_IN_ARRAY);


if (arr1 == NULL) return NULL;
arr2 = PyArray_FROM_OTF(arg2, NPY_DOUBLE, NPY_ARRAY_IN_ARRAY);
if (arr2 == NULL) goto fail;
#if NPY_API_VERSION >= 0x0000000c
oarr = PyArray_FROM_OTF(out, NPY_DOUBLE, NPY_ARRAY_INOUT_ARRAY2);
#else
oarr = PyArray_FROM_OTF(out, NPY_DOUBLE, NPY_ARRAY_INOUT_ARRAY);
#endif
if (oarr == NULL) goto fail;

/* code that makes use of arguments */


/* You will probably need at least
nd = PyArray_NDIM(<..>) -- number of dimensions
dims = PyArray_DIMS(<..>) -- npy_intp array of length nd
showing length in each dim.
dptr = (double *)PyArray_DATA(<..>) -- pointer to data.

If an error occurs goto fail.


*/

Py_DECREF(arr1);
Py_DECREF(arr2);
#if NPY_API_VERSION >= 0x0000000c
PyArray_ResolveWritebackIfCopy(oarr);
#endif
Py_DECREF(oarr);
Py_INCREF(Py_None);
return Py_None;

fail:
Py_XDECREF(arr1);
Py_XDECREF(arr2);
#if NPY_API_VERSION >= 0x0000000c
PyArray_DiscardWritebackIfCopy(oarr);
#endif
Py_XDECREF(oarr);
return NULL;
}

8.2 Using Python as glue

There is no conversation more boring than the one where everybody


agrees.
— Michel de Montaigne

Duct tape is like the force. It has a light side, and a dark side, and
it holds the universe together.
— Carl Zwanzig

178 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

Many people like to say that Python is a fantastic glue language. Hopefully, this Chapter will convince you that this is
true. The first adopters of Python for science were typically people who used it to glue together large application codes
running on super-computers. Not only was it much nicer to code in Python than in a shell script or Perl, in addition, the
ability to easily extend Python made it relatively easy to create new classes and types specifically adapted to the problems
being solved. From the interactions of these early contributors, Numeric emerged as an array-like object that could be
used to pass data between these applications.
As Numeric has matured and developed into NumPy, people have been able to write more code directly in NumPy. Often
this code is fast-enough for production use, but there are still times that there is a need to access compiled code. Either
to get that last bit of efficiency out of the algorithm or to make it easier to access widely-available codes written in C/C++
or Fortran.
This chapter will review many of the tools that are available for the purpose of accessing code written in other compiled
languages. There are many resources available for learning to call other compiled libraries from Python and the purpose
of this Chapter is not to make you an expert. The main goal is to make you aware of some of the possibilities so that you
will know what to “Google” in order to learn more.

8.2.1 Calling other compiled libraries from Python

While Python is a great language and a pleasure to code in, its dynamic nature results in overhead that can cause some
code ( i.e. raw computations inside of for loops) to be up 10-100 times slower than equivalent code written in a static
compiled language. In addition, it can cause memory usage to be larger than necessary as temporary arrays are created
and destroyed during computation. For many types of computing needs, the extra slow-down and memory consumption
can often not be spared (at least for time- or memory- critical portions of your code). Therefore one of the most common
needs is to call out from Python code to a fast, machine-code routine (e.g. compiled using C/C++ or Fortran). The
fact that this is relatively easy to do is a big reason why Python is such an excellent high-level language for scientific and
engineering programming.
Their are two basic approaches to calling compiled code: writing an extension module that is then imported to Python
using the import command, or calling a shared-library subroutine directly from Python using the ctypes module. Writing
an extension module is the most common method.

Warning: Calling C-code from Python can result in Python crashes if you are not careful. None of the approaches
in this chapter are immune. You have to know something about the way data is handled by both NumPy and by the
third-party library being used.

8.2.2 Hand-generated wrappers

Extension modules were discussed in Writing an extension module. The most basic way to interface with compiled code
is to write an extension module and construct a module method that calls the compiled code. For improved readability,
your method should take advantage of the PyArg_ParseTuple call to convert between Python objects and C data-
types. For standard C data-types there is probably already a built-in converter. For others you may need to write your
own converter and use the "O&" format string which allows you to specify a function that will be used to perform the
conversion from the Python object to whatever C-structures are needed.
Once the conversions to the appropriate C-structures and C data-types have been performed, the next step in the wrapper
is to call the underlying function. This is straightforward if the underlying function is in C or C++. However, in order
to call Fortran code you must be familiar with how Fortran subroutines are called from C/C++ using your compiler and
platform. This can vary somewhat platforms and compilers (which is another reason f2py makes life much simpler for
interfacing Fortran code) but generally involves underscore mangling of the name and the fact that all variables are passed
by reference (i.e. all arguments are pointers).

8.2. Using Python as glue 179


NumPy User Guide, Release 1.22.0

The advantage of the hand-generated wrapper is that you have complete control over how the C-library gets used and
called which can lead to a lean and tight interface with minimal over-head. The disadvantage is that you have to write,
debug, and maintain C-code, although most of it can be adapted using the time-honored technique of “cutting-pasting-and-
modifying” from other extension modules. Because, the procedure of calling out to additional C-code is fairly regimented,
code-generation procedures have been developed to make this process easier. One of these code-generation techniques
is distributed with NumPy and allows easy integration with Fortran and (simple) C code. This package, f2py, will be
covered briefly in the next section.

8.2.3 f2py

F2py allows you to automatically construct an extension module that interfaces to routines in Fortran 77/90/95 code. It has
the ability to parse Fortran 77/90/95 code and automatically generate Python signatures for the subroutines it encounters,
or you can guide how the subroutine interfaces with Python by constructing an interface-definition-file (or modifying the
f2py-produced one).

Creating source for a basic extension module

Probably the easiest way to introduce f2py is to offer a simple example. Here is one of the subroutines contained in a file
named add.f

C
SUBROUTINE ZADD(A,B,C,N)
C
DOUBLE COMPLEX A(*)
DOUBLE COMPLEX B(*)
DOUBLE COMPLEX C(*)
INTEGER N
DO 20 J = 1, N
C(J) = A(J)+B(J)
20 CONTINUE
END

This routine simply adds the elements in two contiguous arrays and places the result in a third. The memory for all three
arrays must be provided by the calling routine. A very basic interface to this routine can be automatically generated by
f2py:

f2py -m add add.f

You should be able to run this command assuming your search-path is set-up properly. This command will produce an
extension module named addmodule.c in the current directory. This extension module can now be compiled and used
from Python just like any other extension module.

Creating a compiled extension module

You can also get f2py to both compile add.f along with the produced extension module leaving only a shared-library
extension file that can be imported from Python:

f2py -c -m add add.f

This command leaves a file named add.{ext} in the current directory (where {ext} is the appropriate extension for a
Python extension module on your platform — so, pyd, etc. ). This module may then be imported from Python. It will
contain a method for each subroutine in add (zadd, cadd, dadd, sadd). The docstring of each method contains information
about how the module method may be called:

180 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

>>> import add


>>> print([Link].__doc__)
zadd(a,b,c,n)

Wrapper for ``zadd``.

Parameters
----------
a : input rank-1 array('D') with bounds (*)
b : input rank-1 array('D') with bounds (*)
c : input rank-1 array('D') with bounds (*)
n : input int

Improving the basic interface

The default interface is a very literal translation of the Fortran code into Python. The Fortran array arguments must now
be NumPy arrays and the integer argument should be an integer. The interface will attempt to convert all arguments
to their required types (and shapes) and issue an error if unsuccessful. However, because it knows nothing about the
semantics of the arguments (such that C is an output and n should really match the array sizes), it is possible to abuse this
function in ways that can cause Python to crash. For example:

>>> [Link]([1, 2, 3], [1, 2], [3, 4], 1000)

will cause a program crash on most systems. Under the covers, the lists are being converted to proper arrays but then the
underlying add loop is told to cycle way beyond the borders of the allocated memory.
In order to improve the interface, directives should be provided. This is accomplished by constructing an interface defi-
nition file. It is usually best to start from the interface file that f2py can produce (where it gets its default behavior from).
To get f2py to generate the interface file use the -h option:

f2py -h [Link] -m add add.f

This command leaves the file [Link] in the current directory. The section of this file corresponding to zadd is:

subroutine zadd(a,b,c,n) ! in :add:add.f


double complex dimension(*) :: a
double complex dimension(*) :: b
double complex dimension(*) :: c
integer :: n
end subroutine zadd

By placing intent directives and checking code, the interface can be cleaned up quite a bit until the Python module method
is both easier to use and more robust.

subroutine zadd(a,b,c,n) ! in :add:add.f


double complex dimension(n) :: a
double complex dimension(n) :: b
double complex intent(out),dimension(n) :: c
integer intent(hide),depend(a) :: n=len(a)
end subroutine zadd

The intent directive, intent(out) is used to tell f2py that c is an output variable and should be created by the interface
before being passed to the underlying code. The intent(hide) directive tells f2py to not allow the user to specify the
variable, n, but instead to get it from the size of a. The depend( a ) directive is necessary to tell f2py that the value of n
depends on the input a (so that it won’t try to create the variable n until the variable a is created).

8.2. Using Python as glue 181


NumPy User Guide, Release 1.22.0

After modifying [Link], the new Python module file can be generated by compiling both add.f and [Link]:

f2py -c [Link] add.f

The new interface has docstring:

>>> import add


>>> print([Link].__doc__)
c = zadd(a,b)

Wrapper for ``zadd``.

Parameters
----------
a : input rank-1 array('D') with bounds (n)
b : input rank-1 array('D') with bounds (n)

Returns
-------
c : rank-1 array('D') with bounds (n)

Now, the function can be called in a much more robust way:

>>> [Link]([1, 2, 3], [4, 5, 6])


array([5.+0.j, 7.+0.j, 9.+0.j])

Notice the automatic conversion to the correct format that occurred.

Inserting directives in Fortran source

The nice interface can also be generated automatically by placing the variable directives as special comments in the original
Fortran code. Thus, if the source code is modified to contain:

C
SUBROUTINE ZADD(A,B,C,N)
C
CF2PY INTENT(OUT) :: C
CF2PY INTENT(HIDE) :: N
CF2PY DOUBLE COMPLEX :: A(N)
CF2PY DOUBLE COMPLEX :: B(N)
CF2PY DOUBLE COMPLEX :: C(N)
DOUBLE COMPLEX A(*)
DOUBLE COMPLEX B(*)
DOUBLE COMPLEX C(*)
INTEGER N
DO 20 J = 1, N
C(J) = A(J) + B(J)
20 CONTINUE
END

Then, one can compile the extension module using:

f2py -c -m add add.f

The resulting signature for the function [Link] is exactly the same one that was created previously. If the original source
code had contained A(N) instead of A(*) and so forth with B and C, then nearly the same interface can be obtained by

182 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

placing the INTENT(OUT) :: C comment line in the source code. The only difference is that N would be an optional
input that would default to the length of A.

A filtering example

For comparison with the other methods to be discussed. Here is another example of a function that filters a two-
dimensional array of double precision floating-point numbers using a fixed averaging filter. The advantage of using
Fortran to index into multi-dimensional arrays should be clear from this example.

SUBROUTINE DFILTER2D(A,B,M,N)
C
DOUBLE PRECISION A(M,N)
DOUBLE PRECISION B(M,N)
INTEGER N, M
CF2PY INTENT(OUT) :: B
CF2PY INTENT(HIDE) :: N
CF2PY INTENT(HIDE) :: M
DO 20 I = 2,M-1
DO 40 J=2,N-1
B(I,J) = A(I,J) +
$ (A(I-1,J)+A(I+1,J) +
$ A(I,J-1)+A(I,J+1) )*0.5D0 +
$ (A(I-1,J-1) + A(I-1,J+1) +
$ A(I+1,J-1) + A(I+1,J+1))*0.25D0
40 CONTINUE
20 CONTINUE
END

This code can be compiled and linked into an extension module named filter using:

f2py -c -m filter filter.f

This will produce an extension module named [Link] in the current directory with a method named dfilter2d that returns
a filtered version of the input.

Calling f2py from Python

The f2py program is written in Python and can be run from inside your code to compile Fortran code at runtime, as
follows:

from numpy import f2py


with open("add.f") as sourcefile:
sourcecode = [Link]()
[Link](sourcecode, modulename='add')
import add

The source string can be any valid Fortran code. If you want to save the extension-module source code then a suitable
file-name can be provided by the source_fn keyword to the compile function.

8.2. Using Python as glue 183


NumPy User Guide, Release 1.22.0

Automatic extension module generation

If you want to distribute your f2py extension module, then you only need to include the .pyf file and the Fortran code.
The distutils extensions in NumPy allow you to define an extension module entirely in terms of this interface file. A valid
[Link] file allowing distribution of the add.f module (as part of the package f2py_examples so that it would
be loaded as f2py_examples.add) is:

def configuration(parent_package='', top_path=None)


from [Link].misc_util import Configuration
config = Configuration('f2py_examples',parent_package, top_path)
config.add_extension('add', sources=['[Link]','add.f'])
return config

if __name__ == '__main__':
from [Link] import setup
setup(**configuration(top_path='').todict())

Installation of the new package is easy using:

pip install .

assuming you have the proper permissions to write to the main site- packages directory for the version of Python you
are using. For the resulting package to work, you need to create a file named __init__.py (in the same directory as
[Link]). Notice the extension module is defined entirely in terms of the [Link] and add.f files. The conversion
of the .pyf file to a .c file is handled by [Link].

Conclusion

The interface definition file (.pyf) is how you can fine-tune the interface between Python and Fortran. There is decent
documentation for f2py at F2PY user guide and reference manual. There is also more information on using f2py (including
how to use it to wrap C codes) at the “Interfacing With Other Languages” heading of the SciPy Cookbook.
The f2py method of linking compiled code is currently the most sophisticated and integrated approach. It allows clean
separation of Python with compiled code while still allowing for separate distribution of the extension module. The only
draw-back is that it requires the existence of a Fortran compiler in order for a user to install the code. However, with the
existence of the free-compilers g77, gfortran, and g95, as well as high-quality commercial compilers, this restriction is not
particularly onerous. In our opinion, Fortran is still the easiest way to write fast and clear code for scientific computing.
It handles complex numbers, and multi-dimensional indexing in the most straightforward way. Be aware, however, that
some Fortran compilers will not be able to optimize code as well as good hand- written C-code.

8.2.4 Cython

Cython is a compiler for a Python dialect that adds (optional) static typing for speed, and allows mixing C or C++ code
into your modules. It produces C or C++ extensions that can be compiled and imported in Python code.
If you are writing an extension module that will include quite a bit of your own algorithmic code as well, then Cython is
a good match. Among its features is the ability to easily and quickly work with multidimensional arrays.
Notice that Cython is an extension-module generator only. Unlike f2py, it includes no automatic facility for compiling and
linking the extension module (which must be done in the usual fashion). It does provide a modified distutils class called
build_ext which lets you build an extension module from a .pyx source. Thus, you could write in a [Link] file:

from [Link] import build_ext


from [Link] import Extension
(continues on next page)

184 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)


from [Link] import setup
import numpy

setup(name='mine', description='Nothing',
ext_modules=[Extension('filter', ['[Link]'],
include_dirs=[numpy.get_include()])],
cmdclass = {'build_ext':build_ext})

Adding the NumPy include directory is, of course, only necessary if you are using NumPy arrays in the extension module
(which is what we assume you are using Cython for). The distutils extensions in NumPy also include support for automat-
ically producing the extension-module and linking it from a .pyx file. It works so that if the user does not have Cython
installed, then it looks for a file with the same file-name but a .c extension which it then uses instead of trying to produce
the .c file again.
If you just use Cython to compile a standard Python module, then you will get a C extension module that typically runs
a bit faster than the equivalent Python module. Further speed increases can be gained by using the cdef keyword to
statically define C variables.
Let’s look at two examples we’ve seen before to see how they might be implemented using Cython. These examples were
compiled into extension modules using Cython 0.21.1.

Complex addition in Cython

Here is part of a Cython module named [Link] which implements the complex addition functions we previously
implemented using f2py:

cimport cython
cimport numpy as np
import numpy as np

# We need to initialize NumPy.


np.import_array()

#@[Link](False)
def zadd(in1, in2):
cdef double complex[:] a = [Link]()
cdef double complex[:] b = [Link]()

out = [Link]([Link][0], np.complex64)


cdef double complex[:] c = [Link]()

for i in range([Link][0]):
c[i].real = a[i].real + b[i].real
c[i].imag = a[i].imag + b[i].imag

return out

This module shows use of the cimport statement to load the definitions from the [Link] header that ships with
Cython. It looks like NumPy is imported twice; cimport only makes the NumPy C-API available, while the regular
import causes a Python-style import at runtime and makes it possible to call into the familiar NumPy Python API.
The example also demonstrates Cython’s “typed memoryviews”, which are like NumPy arrays at the C level, in the sense
that they are shaped and strided arrays that know their own extent (unlike a C array addressed through a bare pointer).
The syntax double complex[:] denotes a one-dimensional array (vector) of doubles, with arbitrary strides. A
contiguous array of ints would be int[::1], while a matrix of floats would be float[:, :].

8.2. Using Python as glue 185


NumPy User Guide, Release 1.22.0

Shown commented is the [Link] decorator, which turns bounds-checking for memory view accesses
on or off on a per-function basis. We can use this to further speed up our code, at the expense of safety (or a manual
check prior to entering the loop).
Other than the view syntax, the function is immediately readable to a Python programmer. Static typing of the variable i
is implicit. Instead of the view syntax, we could also have used Cython’s special NumPy array syntax, but the view syntax
is preferred.

Image filter in Cython

The two-dimensional example we created using Fortran is just as easy to write in Cython:

cimport numpy as np
import numpy as np

np.import_array()

def filter(img):
cdef double[:, :] a = [Link](img, dtype=[Link])
out = [Link]([Link], dtype=[Link])
cdef double[:, ::1] b = out

cdef np.npy_intp i, j

for i in range(1, [Link][0] - 1):


for j in range(1, [Link][1] - 1):
b[i, j] = (a[i, j]
+ .5 * ( a[i-1, j] + a[i+1, j]
+ a[i, j-1] + a[i, j+1])
+ .25 * ( a[i-1, j-1] + a[i-1, j+1]
+ a[i+1, j-1] + a[i+1, j+1]))

return out

This 2-d averaging filter runs quickly because the loop is in C and the pointer computations are done only as needed. If
the code above is compiled as a module image, then a 2-d image, img, can be filtered using this code very quickly using:

import image
out = [Link](img)

Regarding the code, two things are of note: firstly, it is impossible to return a memory view to Python. Instead, a NumPy
array out is first created, and then a view b onto this array is used for the computation. Secondly, the view b is typed
double[:, ::1]. This means 2-d array with contiguous rows, i.e., C matrix order. Specifying the order explicitly
can speed up some algorithms since they can skip stride computations.

Conclusion

Cython is the extension mechanism of choice for several scientific Python libraries, including Scipy, Pandas, SAGE, scikit-
image and scikit-learn, as well as the XML processing library LXML. The language and compiler are well-maintained.
There are several disadvantages of using Cython:
1. When coding custom algorithms, and sometimes when wrapping existing C libraries, some familiarity with C is
required. In particular, when using C memory management (malloc and friends), it’s easy to introduce memory
leaks. However, just compiling a Python module renamed to .pyx can already speed it up, and adding a few type
declarations can give dramatic speedups in some code.

186 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

2. It is easy to lose a clean separation between Python and C which makes re-using your C-code for other non-Python-
related projects more difficult.
3. The C-code generated by Cython is hard to read and modify (and typically compiles with annoying but harmless
warnings).
One big advantage of Cython-generated extension modules is that they are easy to distribute. In summary, Cython is a
very capable tool for either gluing C code or generating an extension module quickly and should not be over-looked. It is
especially useful for people that can’t or won’t write C or Fortran code.

8.2.5 ctypes

Ctypes is a Python extension module, included in the stdlib, that allows you to call an arbitrary function in a shared
library directly from Python. This approach allows you to interface with C-code directly from Python. This opens up
an enormous number of libraries for use from Python. The drawback, however, is that coding mistakes can lead to
ugly program crashes very easily (just as can happen in C) because there is little type or bounds checking done on the
parameters. This is especially true when array data is passed in as a pointer to a raw memory location. The responsibility
is then on you that the subroutine will not access memory outside the actual array area. But, if you don’t mind living a
little dangerously ctypes can be an effective tool for quickly taking advantage of a large shared library (or writing extended
functionality in your own shared library).
Because the ctypes approach exposes a raw interface to the compiled code it is not always tolerant of user mistakes.
Robust use of the ctypes module typically involves an additional layer of Python code in order to check the data types
and array bounds of objects passed to the underlying subroutine. This additional layer of checking (not to mention the
conversion from ctypes objects to C-data-types that ctypes itself performs), will make the interface slower than a hand-
written extension-module interface. However, this overhead should be negligible if the C-routine being called is doing
any significant amount of work. If you are a great Python programmer with weak C skills, ctypes is an easy way to write
a useful interface to a (shared) library of compiled code.
To use ctypes you must
1. Have a shared library.
2. Load the shared library.
3. Convert the Python objects to ctypes-understood arguments.
4. Call the function from the library with the ctypes arguments.

Having a shared library

There are several requirements for a shared library that can be used with ctypes that are platform specific. This guide
assumes you have some familiarity with making a shared library on your system (or simply have a shared library available
to you). Items to remember are:
• A shared library must be compiled in a special way ( e.g. using the -shared flag with gcc).
• On some platforms (e.g. Windows), a shared library requires a .def file that specifies the functions to be exported.
For example a [Link] file might contain:

LIBRARY [Link]
EXPORTS
cool_function1
cool_function2

Alternatively, you may be able to use the storage-class specifier __declspec(dllexport) in the C-definition
of the function to avoid the need for this .def file.

8.2. Using Python as glue 187


NumPy User Guide, Release 1.22.0

There is no standard way in Python distutils to create a standard shared library (an extension module is a “special” shared
library Python understands) in a cross-platform manner. Thus, a big disadvantage of ctypes at the time of writing this
book is that it is difficult to distribute in a cross-platform manner a Python extension that uses ctypes and includes your
own code which should be compiled as a shared library on the users system.

Loading the shared library

A simple, but robust way to load the shared library is to get the absolute path name and load it using the cdll object of
ctypes:

lib = [Link][<full_path_name>]

However, on Windows accessing an attribute of the cdll method will load the first DLL by that name found in the
current directory or on the PATH. Loading the absolute path name requires a little finesse for cross-platform work since
the extension of shared libraries varies. There is a [Link].find_library utility available that can simplify
the process of finding the library to load but it is not foolproof. Complicating matters, different platforms have different
default extensions used by shared libraries (e.g. .dll – Windows, .so – Linux, .dylib – Mac OS X). This must also be taken
into account if you are using ctypes to wrap code that needs to work on several platforms.
NumPy provides a convenience function called ctypeslib.load_library (name, path). This function takes the
name of the shared library (including any prefix like ‘lib’ but excluding the extension) and a path where the shared li-
brary can be located. It returns a ctypes library object or raises an OSError if the library cannot be found or raises
an ImportError if the ctypes module is not available. (Windows users: the ctypes library object loaded using
load_library is always loaded assuming cdecl calling convention. See the ctypes documentation under ctypes.
windll and/or [Link] for ways to load libraries under other calling conventions).
The functions in the shared library are available as attributes of the ctypes library object (returned from ctypeslib.
load_library) or as items using lib['func_name'] syntax. The latter method for retrieving a function name
is particularly useful if the function name contains characters that are not allowable in Python variable names.

Converting arguments

Python ints/longs, strings, and unicode objects are automatically converted as needed to equivalent ctypes arguments
The None object is also converted automatically to a NULL pointer. All other Python objects must be converted to
ctypes-specific types. There are two ways around this restriction that allow ctypes to integrate with other objects.
1. Don’t set the argtypes attribute of the function object and define an _as_parameter_ method for the object
you want to pass in. The _as_parameter_ method must return a Python int which will be passed directly to
the function.
2. Set the argtypes attribute to a list whose entries contain objects with a classmethod named from_param that knows
how to convert your object to an object that ctypes can understand (an int/long, string, unicode, or object with the
_as_parameter_ attribute).
NumPy uses both methods with a preference for the second method because it can be safer. The ctypes attribute of the
ndarray returns an object that has an _as_parameter_ attribute which returns an integer representing the address of
the ndarray to which it is associated. As a result, one can pass this ctypes attribute object directly to a function expecting
a pointer to the data in your ndarray. The caller must be sure that the ndarray object is of the correct type, shape, and has
the correct flags set or risk nasty crashes if the data-pointer to inappropriate arrays are passed in.
To implement the second method, NumPy provides the class-factory function ndpointer in the [Link]
module. This class-factory function produces an appropriate class that can be placed in an argtypes attribute entry of a
ctypes function. The class will contain a from_param method which ctypes will use to convert any ndarray passed in to
the function to a ctypes-recognized object. In the process, the conversion will perform checking on any properties of the
ndarray that were specified by the user in the call to ndpointer. Aspects of the ndarray that can be checked include
the data-type, the number-of-dimensions, the shape, and/or the state of the flags on any array passed. The return value of

188 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

the from_param method is the ctypes attribute of the array which (because it contains the _as_parameter_ attribute
pointing to the array data area) can be used by ctypes directly.
The ctypes attribute of an ndarray is also endowed with additional attributes that may be convenient when passing ad-
ditional information about the array into a ctypes function. The attributes data, shape, and strides can provide ctypes
compatible types corresponding to the data-area, the shape, and the strides of the array. The data attribute returns a
c_void_p representing a pointer to the data area. The shape and strides attributes each return an array of ctypes inte-
gers (or None representing a NULL pointer, if a 0-d array). The base ctype of the array is a ctype integer of the same
size as a pointer on the platform. There are also methods data_as({ctype}), shape_as(<base ctype>),
and strides_as(<base ctype>). These return the data as a ctype object of your choice and the shape/strides
arrays using an underlying base type of your choice. For convenience, the ctypeslib module also contains c_intp
as a ctypes integer data-type whose size is the same as the size of c_void_p on the platform (its value is None if ctypes
is not installed).

Calling the function

The function is accessed as an attribute of or an item from the loaded shared-library. Thus, if ./[Link] has a
function named cool_function1, it may be accessed either as:

lib = [Link].load_library('mylib','.')
func1 = lib.cool_function1 # or equivalently
func1 = lib['cool_function1']

In ctypes, the return-value of a function is set to be ‘int’ by default. This behavior can be changed by setting the restype
attribute of the function. Use None for the restype if the function has no return value (‘void’):

[Link] = None

As previously discussed, you can also set the argtypes attribute of the function in order to have ctypes check the types of
the input arguments when the function is called. Use the ndpointer factory function to generate a ready-made class
for data-type, shape, and flags checking on your new function. The ndpointer function has the signature
ndpointer(dtype=None, ndim=None, shape=None, flags=None)
Keyword arguments with the value None are not checked. Specifying a keyword enforces checking of that aspect
of the ndarray on conversion to a ctypes-compatible object. The dtype keyword can be any object understood as a
data-type object. The ndim keyword should be an integer, and the shape keyword should be an integer or a sequence
of integers. The flags keyword specifies the minimal flags that are required on any array passed in. This can be
specified as a string of comma separated requirements, an integer indicating the requirement bits OR’d together,
or a flags object returned from the flags attribute of an array with the necessary requirements.
Using an ndpointer class in the argtypes method can make it significantly safer to call a C function using ctypes and the
data- area of an ndarray. You may still want to wrap the function in an additional Python wrapper to make it user-friendly
(hiding some obvious arguments and making some arguments output arguments). In this process, the requires function
in NumPy may be useful to return the right kind of array from a given input.

8.2. Using Python as glue 189


NumPy User Guide, Release 1.22.0

Complete example

In this example, we will demonstrate how the addition function and the filter function implemented previously using the
other approaches can be implemented using ctypes. First, the C code which implements the algorithms contains the
functions zadd, dadd, sadd, cadd, and dfilter2d. The zadd function is:
/* Add arrays of contiguous data */
typedef struct {double real; double imag;} cdouble;
typedef struct {float real; float imag;} cfloat;
void zadd(cdouble *a, cdouble *b, cdouble *c, long n)
{
while (n--) {
c->real = a->real + b->real;
c->imag = a->imag + b->imag;
a++; b++; c++;
}
}

with similar code for cadd, dadd, and sadd that handles complex float, double, and float data-types, respectively:
void cadd(cfloat *a, cfloat *b, cfloat *c, long n)
{
while (n--) {
c->real = a->real + b->real;
c->imag = a->imag + b->imag;
a++; b++; c++;
}
}
void dadd(double *a, double *b, double *c, long n)
{
while (n--) {
*c++ = *a++ + *b++;
}
}
void sadd(float *a, float *b, float *c, long n)
{
while (n--) {
*c++ = *a++ + *b++;
}
}

The code.c file also contains the function dfilter2d:


/*
* Assumes b is contiguous and has strides that are multiples of
* sizeof(double)
*/
void
dfilter2d(double *a, double *b, ssize_t *astrides, ssize_t *dims)
{
ssize_t i, j, M, N, S0, S1;
ssize_t r, c, rm1, rp1, cp1, cm1;

M = dims[0]; N = dims[1];
S0 = astrides[0]/sizeof(double);
S1 = astrides[1]/sizeof(double);
for (i = 1; i < M - 1; i++) {
r = i*S0;
(continues on next page)

190 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)


rp1 = r + S0;
rm1 = r - S0;
for (j = 1; j < N - 1; j++) {
c = j*S1;
cp1 = j + S1;
cm1 = j - S1;
b[i*N + j] = a[r + c] +
(a[rp1 + c] + a[rm1 + c] +
a[r + cp1] + a[r + cm1])*0.5 +
(a[rp1 + cp1] + a[rp1 + cm1] +
a[rm1 + cp1] + a[rm1 + cp1])*0.25;
}
}
}

A possible advantage this code has over the Fortran-equivalent code is that it takes arbitrarily strided (i.e. non-contiguous
arrays) and may also run faster depending on the optimization capability of your compiler. But, it is an obviously more
complicated than the simple code in filter.f. This code must be compiled into a shared library. On my Linux system
this is accomplished using:

gcc -o [Link] -shared code.c

Which creates a shared_library named [Link] in the current directory. On Windows don’t forget to either add
__declspec(dllexport) in front of void on the line preceding each function definition, or write a [Link]
file that lists the names of the functions to be exported.
A suitable Python interface to this shared library should be constructed. To do this create a file named [Link] with
the following lines at the top:

__all__ = ['add', 'filter2d']

import numpy as np
import os

_path = [Link]('__file__')
lib = [Link].load_library('code', _path)
_typedict = {'zadd' : complex, 'sadd' : [Link],
'cadd' : [Link], 'dadd' : float}
for name in _typedict.keys():
val = getattr(lib, name)
[Link] = None
_type = _typedict[name]
[Link] = [[Link](_type,
flags='aligned, contiguous'),
[Link](_type,
flags='aligned, contiguous'),
[Link](_type,
flags='aligned, contiguous,'\
'writeable'),
[Link].c_intp]

This code loads the shared library named code.{ext} located in the same path as this file. It then adds a return type
of void to the functions contained in the library. It also adds argument checking to the functions in the library so that
ndarrays can be passed as the first three arguments along with an integer (large enough to hold a pointer on the platform)
as the fourth argument.
Setting up the filtering function is similar and allows the filtering function to be called with ndarray arguments as the first

8.2. Using Python as glue 191


NumPy User Guide, Release 1.22.0

two arguments and with pointers to integers (large enough to handle the strides and shape of an ndarray) as the last two
arguments.:

[Link]=None
[Link] = [[Link](float, ndim=2,
flags='aligned'),
[Link](float, ndim=2,
flags='aligned, contiguous,'\
'writeable'),
[Link]([Link].c_intp),
[Link]([Link].c_intp)]

Next, define a simple selection function that chooses which addition function to call in the shared library based on the
data-type:

def select(dtype):
if [Link] in ['?bBhHf']:
return [Link], single
elif [Link] in ['F']:
return [Link], csingle
elif [Link] in ['DG']:
return [Link], complex
else:
return [Link], float
return func, ntype

Finally, the two functions to be exported by the interface can be written simply as:

def add(a, b):


requires = ['CONTIGUOUS', 'ALIGNED']
a = [Link](a)
func, dtype = select([Link])
a = [Link](a, dtype, requires)
b = [Link](b, dtype, requires)
c = np.empty_like(a)
func(a,b,c,[Link])
return c

and:

def filter2d(a):
a = [Link](a, float, ['ALIGNED'])
b = np.zeros_like(a)
lib.dfilter2d(a, b, [Link], [Link])
return b

Conclusion

Using ctypes is a powerful way to connect Python with arbitrary C-code. Its advantages for extending Python include
• clean separation of C code from Python code
– no need to learn a new syntax except Python and C
– allows re-use of C code
– functionality in shared libraries written for other purposes can be obtained with a simple Python wrapper and
search for the library.

192 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

• easy integration with NumPy through the ctypes attribute


• full argument checking with the ndpointer class factory
Its disadvantages include
• It is difficult to distribute an extension module made using ctypes because of a lack of support for building shared
libraries in distutils.
• You must have shared-libraries of your code (no static libraries).
• Very little support for C++ code and its different library-calling conventions. You will probably need a C wrapper
around C++ code to use with ctypes (or just use [Link] instead).
Because of the difficulty in distributing an extension module made using ctypes, f2py and Cython are still the easiest ways
to extend Python for package creation. However, ctypes is in some cases a useful alternative. This should bring more
features to ctypes that should eliminate the difficulty in extending Python and distributing the extension using ctypes.

8.2.6 Additional tools you may find useful

These tools have been found useful by others using Python and so are included here. They are discussed separately
because they are either older ways to do things now handled by f2py, Cython, or ctypes (SWIG, PyFort) or because of a
lack of reasonable documentation (SIP, Boost). Links to these methods are not included since the most relevant can be
found using Google or some other search engine, and any links provided here would be quickly dated. Do not assume
that inclusion in this list means that the package deserves attention. Information about these packages are collected here
because many people have found them useful and we’d like to give you as many options as possible for tackling the
problem of easily integrating your code.

SWIG

Simplified Wrapper and Interface Generator (SWIG) is an old and fairly stable method for wrapping C/C++-libraries
to a large variety of other languages. It does not specifically understand NumPy arrays but can be made usable with
NumPy through the use of typemaps. There are some sample typemaps in the numpy/tools/swig directory under numpy.i
together with an example module that makes use of them. SWIG excels at wrapping large C/C++ libraries because it can
(almost) parse their headers and auto-produce an interface. Technically, you need to generate a .i file that defines the
interface. Often, however, this .i file can be parts of the header itself. The interface usually needs a bit of tweaking to
be very useful. This ability to parse C/C++ headers and auto-generate the interface still makes SWIG a useful approach
to adding functionalilty from C/C++ into Python, despite the other methods that have emerged that are more targeted
to Python. SWIG can actually target extensions for several languages, but the typemaps usually have to be language-
specific. Nonetheless, with modifications to the Python-specific typemaps, SWIG can be used to interface a library with
other languages such as Perl, Tcl, and Ruby.
My experience with SWIG has been generally positive in that it is relatively easy to use and quite powerful. It has been
used often before becoming more proficient at writing C-extensions. However, writing custom interfaces with SWIG is
often troublesome because it must be done using the concept of typemaps which are not Python specific and are written
in a C-like syntax. Therefore, other gluing strategies are preferred and SWIG would be probably considered only to wrap
a very-large C/C++ library. Nonetheless, there are others who use SWIG quite happily.

8.2. Using Python as glue 193


NumPy User Guide, Release 1.22.0

SIP

SIP is another tool for wrapping C/C++ libraries that is Python specific and appears to have very good support for C++.
Riverbank Computing developed SIP in order to create Python bindings to the QT library. An interface file must be
written to generate the binding, but the interface file looks a lot like a C/C++ header file. While SIP is not a full C++
parser, it understands quite a bit of C++ syntax as well as its own special directives that allow modification of how the
Python binding is accomplished. It also allows the user to define mappings between Python types and C/C++ structures
and classes.

Boost Python

Boost is a repository of C++ libraries and [Link] is one of those libraries which provides a concise interface for
binding C++ classes and functions to Python. The amazing part of the [Link] approach is that it works entirely in
pure C++ without introducing a new syntax. Many users of C++ report that [Link] makes it possible to combine
the best of both worlds in a seamless fashion. Using Boost to wrap simple C-subroutines is usually over-kill. Its primary
purpose is to make C++ classes available in Python. So, if you have a set of C++ classes that need to be integrated cleanly
into Python, consider learning about and using [Link].

PyFort

PyFort is a nice tool for wrapping Fortran and Fortran-like C-code into Python with support for Numeric arrays. It was
written by Paul Dubois, a distinguished computer scientist and the very first maintainer of Numeric (now retired). It is
worth mentioning in the hopes that somebody will update PyFort to work with NumPy arrays as well which now support
either Fortran or C-style contiguous arrays.

8.3 Writing your own ufunc

I have the Power!


— He-Man

8.3.1 Creating a new universal function

Before reading this, it may help to familiarize yourself with the basics of C extensions for Python by reading/skimming
the tutorials in Section 1 of Extending and Embedding the Python Interpreter and in How to extend NumPy
The umath module is a computer-generated C-module that creates many ufuncs. It provides a great many examples of
how to create a universal function. Creating your own ufunc that will make use of the ufunc machinery is not difficult
either. Suppose you have a function that you want to operate element-by-element over its inputs. By creating a new ufunc
you will obtain a function that handles
• broadcasting
• N-dimensional looping
• automatic type-conversions with minimal memory usage
• optional output arrays
It is not difficult to create your own ufunc. All that is required is a 1-d loop for each data-type you want to support. Each
1-d loop must have a specific signature, and only ufuncs for fixed-size data-types can be used. The function call used to
create a new ufunc to work on built-in data-types is given below. A different mechanism is used to register ufuncs for
user-defined data-types.

194 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

In the next several sections we give example code that can be easily modified to create your own ufuncs. The examples
are successively more complete or complicated versions of the logit function, a common function in statistical modeling.
Logit is also interesting because, due to the magic of IEEE standards (specifically IEEE 754), all of the logit functions
created below automatically have the following behavior.

>>> logit(0)
-inf
>>> logit(1)
inf
>>> logit(2)
nan
>>> logit(-2)
nan

This is wonderful because the function writer doesn’t have to manually propagate infs or nans.

8.3.2 Example Non-ufunc extension

For comparison and general edification of the reader we provide a simple implementation of a C extension of logit that
uses no numpy.
To do this we need two files. The first is the C file which contains the actual code, and the second is the [Link] file used
to create the module.

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <math.h>

/*
* spammodule.c
* This is the C code for a non-numpy Python extension to
* define the logit function, where logit(p) = log(p/(1-p)).
* This function will not work on numpy arrays automatically.
* [Link] must be called in python to generate
* a numpy-friendly function.
*
* Details explaining the Python-C API can be found under
* 'Extending and Embedding' and 'Python/C API' at
* [Link] .
*/

/* This declares the logit function */


static PyObject* spam_logit(PyObject *self, PyObject *args);

/*
* This tells Python what methods this module has.
* See the Python-C API for more information.
*/
static PyMethodDef SpamMethods[] = {
{"logit",
spam_logit,
METH_VARARGS, "compute logit"},
{NULL, NULL, 0, NULL}
};
(continues on next page)

8.3. Writing your own ufunc 195


NumPy User Guide, Release 1.22.0

(continued from previous page)

/*
* This actually defines the logit function for
* input args from Python.
*/

static PyObject* spam_logit(PyObject *self, PyObject *args)


{
double p;

/* This parses the Python argument into a double */


if(!PyArg_ParseTuple(args, "d", &p)) {
return NULL;
}

/* THE ACTUAL LOGIT FUNCTION */


p = p/(1-p);
p = log(p);

/*This builds the answer back into a python object */


return Py_BuildValue("d", p);
}

/* This initiates the module using the above definitions. */


static struct PyModuleDef moduledef = {
PyModuleDef_HEAD_INIT,
"spam",
NULL,
-1,
SpamMethods,
NULL,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC PyInit_spam(void)
{
PyObject *m;
m = PyModule_Create(&moduledef);
if (!m) {
return NULL;
}
return m;
}

To use the [Link] file, place [Link] and spammodule.c in the same folder. Then python [Link] build will build the
module to import, or [Link] install will install the module to your site-packages directory.

'''
[Link] file for spammodule.c

Calling
$python [Link] build_ext --inplace
(continues on next page)

196 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)


will build the extension library in the current file.

Calling
$python [Link] build
will build a file that looks like ./build/lib*, where
lib* is a file that begins with lib. The library will
be in this file and end with a C library extension,
such as .so

Calling
$python [Link] install
will install the module in your site-packages file.

See the distutils section of


'Extending and Embedding the Python Interpreter'
at [Link] for more information.
'''

from [Link] import setup, Extension

module1 = Extension('spam', sources=['spammodule.c'],


include_dirs=['/usr/local/lib'])

setup(name = 'spam',
version='1.0',
description='This is my spam package',
ext_modules = [module1])

Once the spam module is imported into python, you can call logit via [Link]. Note that the function used above cannot
be applied as-is to numpy arrays. To do so we must call [Link] on it. For example, if a python interpreter is
opened in the file containing the spam library or spam has been installed, one can perform the following commands:

>>> import numpy as np


>>> import spam
>>> [Link](0)
-inf
>>> [Link](1)
inf
>>> [Link](0.5)
0.0
>>> x = [Link](0,1,10)
>>> [Link](x)
TypeError: only length-1 arrays can be converted to Python scalars
>>> f = [Link]([Link])
>>> f(x)
array([ -inf, -2.07944154, -1.25276297, -0.69314718, -0.22314355,
0.22314355, 0.69314718, 1.25276297, 2.07944154, inf])

THE RESULTING LOGIT FUNCTION IS NOT FAST! [Link] simply loops over [Link]. The loop is
done at the C level, but the numpy array is constantly being parsed and build back up. This is expensive. When the author
compared [Link]([Link]) against the logit ufuncs constructed below, the logit ufuncs were almost exactly 4
times faster. Larger or smaller speedups are, of course, possible depending on the nature of the function.

8.3. Writing your own ufunc 197


NumPy User Guide, Release 1.22.0

8.3.3 Example NumPy ufunc for one dtype

For simplicity we give a ufunc for a single dtype, the ‘f8’ double. As in the previous section, we first give the .c file and
then the [Link] file used to create the module containing the ufunc.
The place in the code corresponding to the actual computations for the ufunc are marked with /*BEGIN main ufunc
computation*/ and /*END main ufunc computation*/. The code in between those lines is the primary thing that must be
changed to create your own ufunc.
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "numpy/ndarraytypes.h"
#include "numpy/ufuncobject.h"
#include "numpy/npy_3kcompat.h"
#include <math.h>

/*
* single_type_logit.c
* This is the C code for creating your own
* NumPy ufunc for a logit function.
*
* In this code we only define the ufunc for
* a single dtype. The computations that must
* be replaced to create a ufunc for
* a different function are marked with BEGIN
* and END.
*
* Details explaining the Python-C API can be found under
* 'Extending and Embedding' and 'Python/C API' at
* [Link] .
*/

static PyMethodDef LogitMethods[] = {


{NULL, NULL, 0, NULL}
};

/* The loop definition must precede the PyMODINIT_FUNC. */

static void double_logit(char **args, npy_intp *dimensions,


npy_intp* steps, void* data)
{
npy_intp i;
npy_intp n = dimensions[0];
char *in = args[0], *out = args[1];
npy_intp in_step = steps[0], out_step = steps[1];

double tmp;

for (i = 0; i < n; i++) {


/*BEGIN main ufunc computation*/
tmp = *(double *)in;
tmp /= 1-tmp;
*((double *)out) = log(tmp);
/*END main ufunc computation*/

in += in_step;
out += out_step;
}
(continues on next page)

198 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)


}

/*This a pointer to the above function*/


PyUFuncGenericFunction funcs[1] = {&double_logit};

/* These are the input and return dtypes of logit.*/


static char types[2] = {NPY_DOUBLE, NPY_DOUBLE};

static void *data[1] = {NULL};

static struct PyModuleDef moduledef = {


PyModuleDef_HEAD_INIT,
"npufunc",
NULL,
-1,
LogitMethods,
NULL,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC PyInit_npufunc(void)
{
PyObject *m, *logit, *d;
m = PyModule_Create(&moduledef);
if (!m) {
return NULL;
}

import_array();
import_umath();

logit = PyUFunc_FromFuncAndData(funcs, data, types, 1, 1, 1,


PyUFunc_None, "logit",
"logit_docstring", 0);

d = PyModule_GetDict(m);

PyDict_SetItemString(d, "logit", logit);


Py_DECREF(logit);

return m;
}

This is a [Link] file for the above code. As before, the module can be build via calling python [Link] build at the
command prompt, or installed to site-packages via python [Link] install.

'''
[Link] file for logit.c
Note that since this is a numpy extension
we use [Link] instead of
distutils from the python standard library.

Calling
$python [Link] build_ext --inplace
(continues on next page)

8.3. Writing your own ufunc 199


NumPy User Guide, Release 1.22.0

(continued from previous page)


will build the extension library in the current file.

Calling
$python [Link] build
will build a file that looks like ./build/lib*, where
lib* is a file that begins with lib. The library will
be in this file and end with a C library extension,
such as .so

Calling
$python [Link] install
will install the module in your site-packages file.

See the distutils section of


'Extending and Embedding the Python Interpreter'
at [Link] and the documentation
on [Link] for more information.
'''

def configuration(parent_package='', top_path=None):


import numpy
from [Link].misc_util import Configuration

config = Configuration('npufunc_directory',
parent_package,
top_path)
config.add_extension('npufunc', ['single_type_logit.c'])

return config

if __name__ == "__main__":
from [Link] import setup
setup(configuration=configuration)

After the above has been installed, it can be imported and used as follows.

>>> import numpy as np


>>> import npufunc
>>> [Link](0.5)
0.0
>>> a = [Link](0,1,5)
>>> [Link](a)
array([ -inf, -1.09861229, 0. , 1.09861229, inf])

8.3.4 Example NumPy ufunc with multiple dtypes

We finally give an example of a full ufunc, with inner loops for half-floats, floats, doubles, and long doubles. As in the
previous sections we first give the .c file and then the corresponding [Link] file.
The places in the code corresponding to the actual computations for the ufunc are marked with /*BEGIN main ufunc
computation*/ and /*END main ufunc computation*/. The code in between those lines is the primary thing that must be
changed to create your own ufunc.

200 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "numpy/ndarraytypes.h"
#include "numpy/ufuncobject.h"
#include "numpy/halffloat.h"
#include <math.h>

/*
* multi_type_logit.c
* This is the C code for creating your own
* NumPy ufunc for a logit function.
*
* Each function of the form type_logit defines the
* logit function for a different numpy dtype. Each
* of these functions must be modified when you
* create your own ufunc. The computations that must
* be replaced to create a ufunc for
* a different function are marked with BEGIN
* and END.
*
* Details explaining the Python-C API can be found under
* 'Extending and Embedding' and 'Python/C API' at
* [Link] .
*
*/

static PyMethodDef LogitMethods[] = {


{NULL, NULL, 0, NULL}
};

/* The loop definitions must precede the PyMODINIT_FUNC. */

static void long_double_logit(char **args, npy_intp *dimensions,


npy_intp* steps, void* data)
{
npy_intp i;
npy_intp n = dimensions[0];
char *in = args[0], *out=args[1];
npy_intp in_step = steps[0], out_step = steps[1];

long double tmp;

for (i = 0; i < n; i++) {


/*BEGIN main ufunc computation*/
tmp = *(long double *)in;
tmp /= 1-tmp;
*((long double *)out) = logl(tmp);
/*END main ufunc computation*/

in += in_step;
out += out_step;
}
}

static void double_logit(char **args, npy_intp *dimensions,


npy_intp* steps, void* data)
(continues on next page)

8.3. Writing your own ufunc 201


NumPy User Guide, Release 1.22.0

(continued from previous page)


{
npy_intp i;
npy_intp n = dimensions[0];
char *in = args[0], *out = args[1];
npy_intp in_step = steps[0], out_step = steps[1];

double tmp;

for (i = 0; i < n; i++) {


/*BEGIN main ufunc computation*/
tmp = *(double *)in;
tmp /= 1-tmp;
*((double *)out) = log(tmp);
/*END main ufunc computation*/

in += in_step;
out += out_step;
}
}

static void float_logit(char **args, npy_intp *dimensions,


npy_intp* steps, void* data)
{
npy_intp i;
npy_intp n = dimensions[0];
char *in=args[0], *out = args[1];
npy_intp in_step = steps[0], out_step = steps[1];

float tmp;

for (i = 0; i < n; i++) {


/*BEGIN main ufunc computation*/
tmp = *(float *)in;
tmp /= 1-tmp;
*((float *)out) = logf(tmp);
/*END main ufunc computation*/

in += in_step;
out += out_step;
}
}

static void half_float_logit(char **args, npy_intp *dimensions,


npy_intp* steps, void* data)
{
npy_intp i;
npy_intp n = dimensions[0];
char *in = args[0], *out = args[1];
npy_intp in_step = steps[0], out_step = steps[1];

float tmp;

for (i = 0; i < n; i++) {

/*BEGIN main ufunc computation*/


tmp = *(npy_half *)in;
(continues on next page)

202 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)


tmp = npy_half_to_float(tmp);
tmp /= 1-tmp;
tmp = logf(tmp);
*((npy_half *)out) = npy_float_to_half(tmp);
/*END main ufunc computation*/

in += in_step;
out += out_step;
}
}

/*This gives pointers to the above functions*/


PyUFuncGenericFunction funcs[4] = {&half_float_logit,
&float_logit,
&double_logit,
&long_double_logit};

static char types[8] = {NPY_HALF, NPY_HALF,


NPY_FLOAT, NPY_FLOAT,
NPY_DOUBLE,NPY_DOUBLE,
NPY_LONGDOUBLE, NPY_LONGDOUBLE};
static void *data[4] = {NULL, NULL, NULL, NULL};

static struct PyModuleDef moduledef = {


PyModuleDef_HEAD_INIT,
"npufunc",
NULL,
-1,
LogitMethods,
NULL,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC PyInit_npufunc(void)
{
PyObject *m, *logit, *d;
m = PyModule_Create(&moduledef);
if (!m) {
return NULL;
}

import_array();
import_umath();

logit = PyUFunc_FromFuncAndData(funcs, data, types, 4, 1, 1,


PyUFunc_None, "logit",
"logit_docstring", 0);

d = PyModule_GetDict(m);

PyDict_SetItemString(d, "logit", logit);


Py_DECREF(logit);

return m;
(continues on next page)

8.3. Writing your own ufunc 203


NumPy User Guide, Release 1.22.0

(continued from previous page)


}

This is a [Link] file for the above code. As before, the module can be build via calling python [Link] build at the
command prompt, or installed to site-packages via python [Link] install.

'''
[Link] file for logit.c
Note that since this is a numpy extension
we use [Link] instead of
distutils from the python standard library.

Calling
$python [Link] build_ext --inplace
will build the extension library in the current file.

Calling
$python [Link] build
will build a file that looks like ./build/lib*, where
lib* is a file that begins with lib. The library will
be in this file and end with a C library extension,
such as .so

Calling
$python [Link] install
will install the module in your site-packages file.

See the distutils section of


'Extending and Embedding the Python Interpreter'
at [Link] and the documentation
on [Link] for more information.
'''

def configuration(parent_package='', top_path=None):


import numpy
from [Link].misc_util import Configuration
from [Link].misc_util import get_info

#Necessary for the half-float d-type.


info = get_info('npymath')

config = Configuration('npufunc_directory',
parent_package,
top_path)
config.add_extension('npufunc',
['multi_type_logit.c'],
extra_info=info)

return config

if __name__ == "__main__":
from [Link] import setup
setup(configuration=configuration)

After the above has been installed, it can be imported and used as follows.

204 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

>>> import numpy as np


>>> import npufunc
>>> [Link](0.5)
0.0
>>> a = [Link](0,1,5)
>>> [Link](a)
array([ -inf, -1.09861229, 0. , 1.09861229, inf])

8.3.5 Example NumPy ufunc with multiple arguments/return values

Our final example is a ufunc with multiple arguments. It is a modification of the code for a logit ufunc for data with a
single dtype. We compute (A*B, logit(A*B)).
We only give the C code as the [Link] file is exactly the same as the [Link] file in Example NumPy ufunc for one dtype,
except that the line

config.add_extension('npufunc', ['single_type_logit.c'])

is replaced with

config.add_extension('npufunc', ['multi_arg_logit.c'])

The C file is given below. The ufunc generated takes two arguments A and B. It returns a tuple whose first element is A*B
and whose second element is logit(A*B). Note that it automatically supports broadcasting, as well as all other properties
of a ufunc.

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "numpy/ndarraytypes.h"
#include "numpy/ufuncobject.h"
#include "numpy/halffloat.h"
#include <math.h>

/*
* multi_arg_logit.c
* This is the C code for creating your own
* NumPy ufunc for a multiple argument, multiple
* return value ufunc. The places where the
* ufunc computation is carried out are marked
* with comments.
*
* Details explaining the Python-C API can be found under
* 'Extending and Embedding' and 'Python/C API' at
* [Link] .
*
*/

static PyMethodDef LogitMethods[] = {


{NULL, NULL, 0, NULL}
};

/* The loop definition must precede the PyMODINIT_FUNC. */

static void double_logitprod(char **args, npy_intp *dimensions,


(continues on next page)

8.3. Writing your own ufunc 205


NumPy User Guide, Release 1.22.0

(continued from previous page)


npy_intp* steps, void* data)
{
npy_intp i;
npy_intp n = dimensions[0];
char *in1 = args[0], *in2 = args[1];
char *out1 = args[2], *out2 = args[3];
npy_intp in1_step = steps[0], in2_step = steps[1];
npy_intp out1_step = steps[2], out2_step = steps[3];

double tmp;

for (i = 0; i < n; i++) {


/*BEGIN main ufunc computation*/
tmp = *(double *)in1;
tmp *= *(double *)in2;
*((double *)out1) = tmp;
*((double *)out2) = log(tmp/(1-tmp));
/*END main ufunc computation*/

in1 += in1_step;
in2 += in2_step;
out1 += out1_step;
out2 += out2_step;
}
}

/*This a pointer to the above function*/


PyUFuncGenericFunction funcs[1] = {&double_logitprod};

/* These are the input and return dtypes of logit.*/

static char types[4] = {NPY_DOUBLE, NPY_DOUBLE,


NPY_DOUBLE, NPY_DOUBLE};

static void *data[1] = {NULL};

static struct PyModuleDef moduledef = {


PyModuleDef_HEAD_INIT,
"npufunc",
NULL,
-1,
LogitMethods,
NULL,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC PyInit_npufunc(void)
{
PyObject *m, *logit, *d;
m = PyModule_Create(&moduledef);
if (!m) {
return NULL;
}
(continues on next page)

206 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)

import_array();
import_umath();

logit = PyUFunc_FromFuncAndData(funcs, data, types, 1, 2, 2,


PyUFunc_None, "logit",
"logit_docstring", 0);

d = PyModule_GetDict(m);

PyDict_SetItemString(d, "logit", logit);


Py_DECREF(logit);

return m;
}

8.3.6 Example NumPy ufunc with structured array dtype arguments

This example shows how to create a ufunc for a structured array dtype. For the example we show a trivial ufunc
for adding two arrays with dtype ‘u8,u8,u8’. The process is a bit different from the other examples since a call to
PyUFunc_FromFuncAndData doesn’t fully register ufuncs for custom dtypes and structured array dtypes. We need
to also call PyUFunc_RegisterLoopForDescr to finish setting up the ufunc.
We only give the C code as the [Link] file is exactly the same as the [Link] file in Example NumPy ufunc for one dtype,
except that the line
config.add_extension('npufunc', ['single_type_logit.c'])

is replaced with
config.add_extension('npufunc', ['add_triplet.c'])

The C file is given below.


#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "numpy/ndarraytypes.h"
#include "numpy/ufuncobject.h"
#include "numpy/npy_3kcompat.h"
#include <math.h>

/*
* add_triplet.c
* This is the C code for creating your own
* NumPy ufunc for a structured array dtype.
*
* Details explaining the Python-C API can be found under
* 'Extending and Embedding' and 'Python/C API' at
* [Link] .
*/

static PyMethodDef StructUfuncTestMethods[] = {


{NULL, NULL, 0, NULL}
};
(continues on next page)

8.3. Writing your own ufunc 207


NumPy User Guide, Release 1.22.0

(continued from previous page)

/* The loop definition must precede the PyMODINIT_FUNC. */

static void add_uint64_triplet(char **args, npy_intp *dimensions,


npy_intp* steps, void* data)
{
npy_intp i;
npy_intp is1=steps[0];
npy_intp is2=steps[1];
npy_intp os=steps[2];
npy_intp n=dimensions[0];
uint64_t *x, *y, *z;

char *i1=args[0];
char *i2=args[1];
char *op=args[2];

for (i = 0; i < n; i++) {

x = (uint64_t*)i1;
y = (uint64_t*)i2;
z = (uint64_t*)op;

z[0] = x[0] + y[0];


z[1] = x[1] + y[1];
z[2] = x[2] + y[2];

i1 += is1;
i2 += is2;
op += os;
}
}

/* This a pointer to the above function */


PyUFuncGenericFunction funcs[1] = {&add_uint64_triplet};

/* These are the input and return dtypes of add_uint64_triplet. */


static char types[3] = {NPY_UINT64, NPY_UINT64, NPY_UINT64};

static void *data[1] = {NULL};

static struct PyModuleDef moduledef = {


PyModuleDef_HEAD_INIT,
"struct_ufunc_test",
NULL,
-1,
StructUfuncTestMethods,
NULL,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC PyInit_struct_ufunc_test(void)
{
PyObject *m, *add_triplet, *d;
PyObject *dtype_dict;
(continues on next page)

208 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)


PyArray_Descr *dtype;
PyArray_Descr *dtypes[3];

m = PyModule_Create(&moduledef);

if (m == NULL) {
return NULL;
}

import_array();
import_umath();

/* Create a new ufunc object */


add_triplet = PyUFunc_FromFuncAndData(NULL, NULL, NULL, 0, 2, 1,
PyUFunc_None, "add_triplet",
"add_triplet_docstring", 0);

dtype_dict = Py_BuildValue("[(s, s), (s, s), (s, s)]",


"f0", "u8", "f1", "u8", "f2", "u8");
PyArray_DescrConverter(dtype_dict, &dtype);
Py_DECREF(dtype_dict);

dtypes[0] = dtype;
dtypes[1] = dtype;
dtypes[2] = dtype;

/* Register ufunc for structured dtype */


PyUFunc_RegisterLoopForDescr(add_triplet,
dtype,
&add_uint64_triplet,
dtypes,
NULL);

d = PyModule_GetDict(m);

PyDict_SetItemString(d, "add_triplet", add_triplet);


Py_DECREF(add_triplet);
return m;
}

The returned ufunc object is a callable Python object. It should be placed in a (module) dictionary under the same name as
was used in the name argument to the ufunc-creation routine. The following example is adapted from the umath module

static PyUFuncGenericFunction atan2_functions[] = {


PyUFunc_ff_f, PyUFunc_dd_d,
PyUFunc_gg_g, PyUFunc_OO_O_method};
static void* atan2_data[] = {
(void *)atan2f,(void *) atan2,
(void *)atan2l,(void *)"arctan2"};
static char atan2_signatures[] = {
NPY_FLOAT, NPY_FLOAT, NPY_FLOAT,
NPY_DOUBLE, NPY_DOUBLE, NPY_DOUBLE,
NPY_LONGDOUBLE, NPY_LONGDOUBLE, NPY_LONGDOUBLE
NPY_OBJECT, NPY_OBJECT, NPY_OBJECT};
...
/* in the module initialization code */
(continues on next page)

8.3. Writing your own ufunc 209


NumPy User Guide, Release 1.22.0

(continued from previous page)


PyObject *f, *dict, *module;
...
dict = PyModule_GetDict(module);
...
f = PyUFunc_FromFuncAndData(atan2_functions,
atan2_data, atan2_signatures, 4, 2, 1,
PyUFunc_None, "arctan2",
"a safe and correct arctan(x1/x2)", 0);
PyDict_SetItemString(dict, "arctan2", f);
Py_DECREF(f);
...

8.4 Beyond the Basics

The voyage of discovery is not in seeking new landscapes but in having


new eyes.
— Marcel Proust

Discovery is seeing what everyone else has seen and thinking what no
one else has thought.
— Albert Szent-Gyorgi

8.4.1 Iterating over elements in the array

Basic Iteration

One common algorithmic requirement is to be able to walk over all elements in a multidimensional array. The array
iterator object makes this easy to do in a generic way that works for arrays of any dimension. Naturally, if you know
the number of dimensions you will be using, then you can always write nested for loops to accomplish the iteration. If,
however, you want to write code that works with any number of dimensions, then you can make use of the array iterator.
An array iterator object is returned when accessing the .flat attribute of an array.
Basic usage is to call PyArray_IterNew ( array ) where array is an ndarray object (or one of its sub-classes).
The returned object is an array-iterator object (the same object returned by the .flat attribute of the ndarray). This
object is usually cast to PyArrayIterObject* so that its members can be accessed. The only members that are needed are
iter->size which contains the total size of the array, iter->index, which contains the current 1-d index into the
array, and iter->dataptr which is a pointer to the data for the current element of the array. Sometimes it is also
useful to access iter->ao which is a pointer to the underlying ndarray object.
After processing data at the current element of the array, the next element of the array can be obtained using the macro
PyArray_ITER_NEXT ( iter ). The iteration always proceeds in a C-style contiguous fashion (last index varying the
fastest). The PyArray_ITER_GOTO ( iter, destination ) can be used to jump to a particular point in the array,
where destination is an array of npy_intp data-type with space to handle at least the number of dimensions in the
underlying array. Occasionally it is useful to use PyArray_ITER_GOTO1D ( iter, index ) which will jump to the
1-d index given by the value of index. The most common usage, however, is given in the following example.

PyObject *obj; /* assumed to be some ndarray object */


PyArrayIterObject *iter;
...
(continues on next page)

210 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(continued from previous page)


iter = (PyArrayIterObject *)PyArray_IterNew(obj);
if (iter == NULL) goto fail; /* Assume fail has clean-up code */
while (iter->index < iter->size) {
/* do something with the data at it->dataptr */
PyArray_ITER_NEXT(it);
}
...

You can also use PyArrayIter_Check ( obj ) to ensure you have an iterator object and PyArray_ITER_RESET
( iter ) to reset an iterator object back to the beginning of the array.
It should be emphasized at this point that you may not need the array iterator if your array is already contiguous (using an
array iterator will work but will be slower than the fastest code you could write). The major purpose of array iterators is to
encapsulate iteration over N-dimensional arrays with arbitrary strides. They are used in many, many places in the NumPy
source code itself. If you already know your array is contiguous (Fortran or C), then simply adding the element- size to
a running pointer variable will step you through the array very efficiently. In other words, code like this will probably be
faster for you in the contiguous case (assuming doubles).

npy_intp size;
double *dptr; /* could make this any variable type */
size = PyArray_SIZE(obj);
dptr = PyArray_DATA(obj);
while(size--) {
/* do something with the data at dptr */
dptr++;
}

Iterating over all but one axis

A common algorithm is to loop over all elements of an array and perform some function with each element by issuing a
function call. As function calls can be time consuming, one way to speed up this kind of algorithm is to write the function
so it takes a vector of data and then write the iteration so the function call is performed for an entire dimension of data at a
time. This increases the amount of work done per function call, thereby reducing the function-call over-head to a small(er)
fraction of the total time. Even if the interior of the loop is performed without a function call it can be advantageous to
perform the inner loop over the dimension with the highest number of elements to take advantage of speed enhancements
available on micro- processors that use pipelining to enhance fundamental operations.
The PyArray_IterAllButAxis ( array, &dim ) constructs an iterator object that is modified so that it
will not iterate over the dimension indicated by dim. The only restriction on this iterator object, is that the
PyArray_ITER_GOTO1D ( it, ind ) macro cannot be used (thus flat indexing won’t work either if you pass this
object back to Python — so you shouldn’t do this). Note that the returned object from this routine is still usually cast
to PyArrayIterObject *. All that’s been done is to modify the strides and dimensions of the returned iterator to simulate
iterating over array[…,0,…] where 0 is placed on the dimth dimension. If dim is negative, then the dimension with the
largest axis is found and used.

8.4. Beyond the Basics 211


NumPy User Guide, Release 1.22.0

Iterating over multiple arrays

Very often, it is desirable to iterate over several arrays at the same time. The universal functions are an example of this
kind of behavior. If all you want to do is iterate over arrays with the same shape, then simply creating several iterator
objects is the standard procedure. For example, the following code iterates over two arrays assumed to be the same shape
and size (actually obj1 just has to have at least as many total elements as does obj2):

/* It is already assumed that obj1 and obj2


are ndarrays of the same shape and size.
*/
iter1 = (PyArrayIterObject *)PyArray_IterNew(obj1);
if (iter1 == NULL) goto fail;
iter2 = (PyArrayIterObject *)PyArray_IterNew(obj2);
if (iter2 == NULL) goto fail; /* assume iter1 is DECREF'd at fail */
while (iter2->index < iter2->size) {
/* process with iter1->dataptr and iter2->dataptr */
PyArray_ITER_NEXT(iter1);
PyArray_ITER_NEXT(iter2);
}

Broadcasting over multiple arrays

When multiple arrays are involved in an operation, you may want to use the same broadcasting rules that the math
operations (i.e. the ufuncs) use. This can be done easily using the PyArrayMultiIterObject. This is the
object returned from the Python command [Link] and it is almost as easy to use from C. The function
PyArray_MultiIterNew ( n, ... ) is used (with n input objects in place of ... ). The input objects can be
arrays or anything that can be converted into an array. A pointer to a PyArrayMultiIterObject is returned. Broad-
casting has already been accomplished which adjusts the iterators so that all that needs to be done to advance to the
next element in each array is for PyArray_ITER_NEXT to be called for each of the inputs. This incrementing is
automatically performed by PyArray_MultiIter_NEXT ( obj ) macro (which can handle a multiterator obj
as either a PyArrayMultiIterObject* or a PyObject*). The data from input number i is available using
PyArray_MultiIter_DATA ( obj, i ). An example of using this feature follows.

mobj = PyArray_MultiIterNew(2, obj1, obj2);


size = mobj->size;
while(size--) {
ptr1 = PyArray_MultiIter_DATA(mobj, 0);
ptr2 = PyArray_MultiIter_DATA(mobj, 1);
/* code using contents of ptr1 and ptr2 */
PyArray_MultiIter_NEXT(mobj);
}

The function PyArray_RemoveSmallest ( multi ) can be used to take a multi-iterator object and adjust all the
iterators so that iteration does not take place over the largest dimension (it makes that dimension of size 1). The code
being looped over that makes use of the pointers will very-likely also need the strides data for each of the iterators. This
information is stored in multi->iters[i]->strides.
There are several examples of using the multi-iterator in the NumPy source code as it makes N-dimensional broadcasting-
code very simple to write. Browse the source for more examples.

212 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

8.4.2 User-defined data-types

NumPy comes with 24 builtin data-types. While this covers a large majority of possible use cases, it is conceivable that
a user may have a need for an additional data-type. There is some support for adding an additional data-type into the
NumPy system. This additional data- type will behave much like a regular data-type except ufuncs must have 1-d loops
registered to handle it separately. Also checking for whether or not other data-types can be cast “safely” to and from this
new type or not will always return “can cast” unless you also register which types your new data-type can be cast to and
from.
The NumPy source code includes an example of a custom data-type as part of its test suite. The file
_rational_tests.[Link] in the source code directory numpy/numpy/core/src/umath/ contains an im-
plementation of a data-type that represents a rational number as the ratio of two 32 bit integers.

Adding the new data-type

To begin to make use of the new data-type, you need to first define a new Python type to hold the scalars of your new
data-type. It should be acceptable to inherit from one of the array scalars if your new type has a binary compatible layout.
This will allow your new data type to have the methods and attributes of array scalars. New data- types must have a fixed
memory size (if you want to define a data-type that needs a flexible representation, like a variable-precision number, then
use a pointer to the object as the data-type). The memory layout of the object structure for the new Python type must be
PyObject_HEAD followed by the fixed-size memory needed for the data- type. For example, a suitable structure for the
new Python type is:

typedef struct {
PyObject_HEAD;
some_data_type obval;
/* the name can be whatever you want */
} PySomeDataTypeObject;

After you have defined a new Python type object, you must then define a new PyArray_Descr structure whose type-
object member will contain a pointer to the data-type you’ve just defined. In addition, the required functions in the
“.f” member must be defined: nonzero, copyswap, copyswapn, setitem, getitem, and cast. The more functions in the “.f”
member you define, however, the more useful the new data-type will be. It is very important to initialize unused functions
to NULL. This can be achieved using PyArray_InitArrFuncs (f).
Once a new PyArray_Descr structure is created and filled with the needed information and useful functions you
call PyArray_RegisterDataType (new_descr). The return value from this call is an integer providing you
with a unique type_number that specifies your data-type. This type number should be stored and made available
by your module so that other modules can use it to recognize your data-type (the other mechanism for finding a
user-defined data-type number is to search based on the name of the type-object associated with the data-type using
PyArray_TypeNumFromName ).

Registering a casting function

You may want to allow builtin (and other user-defined) data-types to be cast automatically to your data-type. In order to
make this possible, you must register a casting function with the data-type you want to be able to cast from. This requires
writing low-level casting functions for each conversion you want to support and then registering these functions with the
data-type descriptor. A low-level casting function has the signature.
void castfunc(void *from, void *to, npy_intp n, void *fromarr, void *toarr)
Cast n elements from one type to another. The data to cast from is in a contiguous, correctly-swapped and aligned
chunk of memory pointed to by from. The buffer to cast to is also contiguous, correctly-swapped and aligned. The
fromarr and toarr arguments should only be used for flexible-element-sized arrays (string, unicode, void).
An example castfunc is:

8.4. Beyond the Basics 213


NumPy User Guide, Release 1.22.0

static void
double_to_float(double *from, float* to, npy_intp n,
void* ignore1, void* ignore2) {
while (n--) {
(*to++) = (double) *(from++);
}
}

This could then be registered to convert doubles to floats using the code:

doub = PyArray_DescrFromType(NPY_DOUBLE);
PyArray_RegisterCastFunc(doub, NPY_FLOAT,
(PyArray_VectorUnaryFunc *)double_to_float);
Py_DECREF(doub);

Registering coercion rules

By default, all user-defined data-types are not presumed to be safely castable to any builtin data-types. In addition builtin
data-types are not presumed to be safely castable to user-defined data-types. This situation limits the ability of user-
defined data-types to participate in the coercion system used by ufuncs and other times when automatic coercion takes
place in NumPy. This can be changed by registering data-types as safely castable from a particular data-type object. The
function PyArray_RegisterCanCast (from_descr, totype_number, scalarkind) should be used to specify that the
data-type object from_descr can be cast to the data-type with type number totype_number. If you are not trying to alter
scalar coercion rules, then use NPY_NOSCALAR for the scalarkind argument.
If you want to allow your new data-type to also be able to share in the scalar coercion rules, then you need to specify the
scalarkind function in the data-type object’s “.f” member to return the kind of scalar the new data-type should be seen as
(the value of the scalar is available to that function). Then, you can register data-types that can be cast to separately for
each scalar kind that may be returned from your user-defined data-type. If you don’t register scalar coercion handling,
then all of your user-defined data-types will be seen as NPY_NOSCALAR.

Registering a ufunc loop

You may also want to register low-level ufunc loops for your data-type so that an ndarray of your data-type can have math
applied to it seamlessly. Registering a new loop with exactly the same arg_types signature, silently replaces any previously
registered loops for that data-type.
Before you can register a 1-d loop for a ufunc, the ufunc must be previously created. Then you call
PyUFunc_RegisterLoopForType (…) with the information needed for the loop. The return value of this function
is 0 if the process was successful and -1 with an error condition set if it was not successful.

8.4.3 Subtyping the ndarray in C

One of the lesser-used features that has been lurking in Python since 2.2 is the ability to sub-class types in C. This facility
is one of the important reasons for basing NumPy off of the Numeric code-base which was already in C. A sub-type in C
allows much more flexibility with regards to memory management. Sub-typing in C is not difficult even if you have only
a rudimentary understanding of how to create new types for Python. While it is easiest to sub-type from a single parent
type, sub-typing from multiple parent types is also possible. Multiple inheritance in C is generally less useful than it is in
Python because a restriction on Python sub-types is that they have a binary compatible memory layout. Perhaps for this
reason, it is somewhat easier to sub-type from a single parent type.
All C-structures corresponding to Python objects must begin with PyObject_HEAD (or PyObject_VAR_HEAD). In
the same way, any sub-type must have a C-structure that begins with exactly the same memory layout as the parent type

214 8. Using NumPy C-API


NumPy User Guide, Release 1.22.0

(or all of the parent types in the case of multiple-inheritance). The reason for this is that Python may attempt to access
a member of the sub-type structure as if it had the parent structure ( i.e. it will cast a given pointer to a pointer to the
parent structure and then dereference one of it’s members). If the memory layouts are not compatible, then this attempt
will cause unpredictable behavior (eventually leading to a memory violation and program crash).
One of the elements in PyObject_HEAD is a pointer to a type-object structure. A new Python type is created by creating
a new type-object structure and populating it with functions and pointers to describe the desired behavior of the type.
Typically, a new C-structure is also created to contain the instance-specific information needed for each object of the type
as well. For example, &PyArray_Type is a pointer to the type-object table for the ndarray while a PyArrayObject*
variable is a pointer to a particular instance of an ndarray (one of the members of the ndarray structure is, in turn, a pointer
to the type- object table &PyArray_Type). Finally PyType_Ready (<pointer_to_type_object>) must be called for
every new Python type.

Creating sub-types

To create a sub-type, a similar procedure must be followed except only behaviors that are different require new entries
in the type- object structure. All other entries can be NULL and will be filled in by PyType_Ready with appropriate
functions from the parent type(s). In particular, to create a sub-type in C follow these steps:
1. If needed create a new C-structure to handle each instance of your type. A typical C-structure would be:

typedef _new_struct {
PyArrayObject base;
/* new things here */
} NewArrayObject;

Notice that the full PyArrayObject is used as the first entry in order to ensure that the binary layout of instances of
the new type is identical to the PyArrayObject.
2. Fill in a new Python type-object structure with pointers to new functions that will over-ride the default behavior while
leaving any function that should remain the same unfilled (or NULL). The tp_name element should be different.
3. Fill in the tp_base member of the new type-object structure with a pointer to the (main) parent type object. For
multiple-inheritance, also fill in the tp_bases member with a tuple containing all of the parent objects in the order
they should be used to define inheritance. Remember, all parent-types must have the same C-structure for multiple
inheritance to work properly.
4. Call PyType_Ready (<pointer_to_new_type>). If this function returns a negative number, a failure occurred
and the type is not initialized. Otherwise, the type is ready to be used. It is generally important to place a reference
to the new type into the module dictionary so it can be accessed from Python.
More information on creating sub-types in C can be learned by reading PEP 253 (available at [Link]
dev/peps/pep-0253).

Specific features of ndarray sub-typing

Some special methods and attributes are used by arrays in order to facilitate the interoperation of sub-types with the base
ndarray type.

8.4. Beyond the Basics 215


NumPy User Guide, Release 1.22.0

The __array_finalize__ method


ndarray.__array_finalize__
Several array-creation functions of the ndarray allow specification of a particular sub-type to be created. This al-
lows sub-types to be handled seamlessly in many routines. When a sub-type is created in such a fashion, however,
neither the __new__ method nor the __init__ method gets called. Instead, the sub-type is allocated and the ap-
propriate instance-structure members are filled in. Finally, the __array_finalize__ attribute is looked-up
in the object dictionary. If it is present and not None, then it can be either a CObject containing a pointer to a
PyArray_FinalizeFunc or it can be a method taking a single argument (which could be None)
If the __array_finalize__ attribute is a CObject, then the pointer must be a pointer to a function with the
signature:

(int) (PyArrayObject *, PyObject *)

The first argument is the newly created sub-type. The second argument (if not NULL) is the “parent” array (if
the array was created using slicing or some other operation where a clearly-distinguishable parent is present). This
routine can do anything it wants to. It should return a -1 on error and 0 otherwise.
If the __array_finalize__ attribute is not None nor a CObject, then it must be a Python method that takes
the parent array as an argument (which could be None if there is no parent), and returns nothing. Errors in this
method will be caught and handled.

The __array_priority__ attribute


ndarray.__array_priority__
This attribute allows simple but flexible determination of which sub- type should be considered “primary” when an
operation involving two or more sub-types arises. In operations where different sub-types are being used, the sub-
type with the largest __array_priority__ attribute will determine the sub-type of the output(s). If two sub-
types have the same __array_priority__ then the sub-type of the first argument determines the output. The
default __array_priority__ attribute returns a value of 0.0 for the base ndarray type and 1.0 for a sub-type.
This attribute can also be defined by objects that are not sub-types of the ndarray and can be used to determine
which __array_wrap__ method should be called for the return output.

The __array_wrap__ method


ndarray.__array_wrap__
Any class or type can define this method which should take an ndarray argument and return an instance of the
type. It can be seen as the opposite of the __array__ method. This method is used by the ufuncs (and other
NumPy functions) to allow other objects to pass through. For Python >2.4, it can also be used to write a decorator
that converts a function that works only with ndarrays to one that works with any type with __array__ and
__array_wrap__ methods.

216 8. Using NumPy C-API


CHAPTER

NINE

NUMPY HOW TOS

These documents are intended as recipes to common tasks using NumPy. For detailed reference documentation of the
functions and classes contained in the package, see the API reference.

9.1 How to write a NumPy how-to

How-tos get straight to the point – they


• answer a focused question, or
• narrow a broad question into focused questions that the user can choose among.

9.1.1 A stranger has asked for directions…

“I need to refuel my car.”

9.1.2 Give a brief but explicit answer

• “Three kilometers/miles, take a right at Hayseed Road, it’s on your left.”


Add helpful details for newcomers (“Hayseed Road”, even though it’s the only turnoff at three km/mi). But not irrelevant
ones:
• Don’t also give directions from Route 7.
• Don’t explain why the town has only one filling station.
If there’s related background (tutorial, explanation, reference, alternative approach), bring it to the user’s attention with a
link (“Directions from Route 7,” “Why so few filling stations?”).

9.1.3 Delegate

• “Three km/mi, take a right at Hayseed Road, follow the signs.”


If the information is already documented and succinct enough for a how-to, just link to it, possibly after an introduction
(“Three km/mi, take a right”).

217
NumPy User Guide, Release 1.22.0

9.1.4 If the question is broad, narrow and redirect it

“I want to see the sights.”


The See the sights how-to should link to a set of narrower how-tos:
• Find historic buildings
• Find scenic lookouts
• Find the town center
and these might in turn link to still narrower how-tos – so the town center page might link to
• Find the court house
• Find city hall
By organizing how-tos this way, you not only display the options for people who need to narrow their question, you also
have provided answers for users who start with narrower questions (“I want to see historic buildings,” “Which way to city
hall?”).

9.1.5 If there are many steps, break them up

If a how-to has many steps:


• Consider breaking a step out into an individual how-to and linking to it.
• Include subheadings. They help readers grasp what’s coming and return where they left off.

9.1.6 Why write how-tos when there’s Stack Overflow, Reddit, Gitter…?

• We have authoritative answers.


• How-tos make the site less forbidding to non-experts.
• How-tos bring people into the site and help them discover other information that’s here .
• Creating how-tos helps us see NumPy usability through new eyes.

9.1.7 Aren’t how-tos and tutorials the same thing?

People use the terms “how-to” and “tutorial” interchangeably, but we draw a distinction, following Daniele Procida’s
taxonomy of documentation.

Documentation needs to meet users where they are. How-tos offer get-it-done information; the user wants steps to copy
and doesn’t necessarily want to understand NumPy. Tutorials are warm-fuzzy information; the user wants a feel for some
aspect of NumPy (and again, may or may not care about deeper knowledge).
We distinguish both tutorials and how-tos from Explanations, which are deep dives intended to give understanding rather
than immediate assistance, and References, which give complete, authoritative data on some concrete part of NumPy (like
its API) but aren’t obligated to paint a broader picture.
For more on tutorials, see Learn to write a NumPy tutorial

218 9. NumPy How Tos


NumPy User Guide, Release 1.22.0

9.1.8 Is this page an example of a how-to?

Yes – until the sections with question-mark headings; they explain rather than giving directions. In a how-to, those would
be links.

9.2 Reading and writing files

This page tackles common applications; for the full collection of I/O routines, see [Link].

9.2.1 Reading text and CSV files

With no missing values

Use [Link].

With missing values

Use [Link].
[Link] will either
• return a masked array masking out missing values (if usemask=True), or
• fill in the missing value with the value specified in filling_values (default is [Link] for float, -1 for int).

With non-whitespace delimiters

>>> print(open("[Link]").read())
1, 2, 3
4,, 6
7, 8, 9

Masked-array output

>>> [Link]("[Link]", delimiter=",", usemask=True)


masked_array(
data=[[1.0, 2.0, 3.0],
[4.0, --, 6.0],
[7.0, 8.0, 9.0]],
mask=[[False, False, False],
[False, True, False],
[False, False, False]],
fill_value=1e+20)

9.2. Reading and writing files 219


NumPy User Guide, Release 1.22.0

Array output

>>> [Link]("[Link]", delimiter=",")


array([[ 1., 2., 3.],
[ 4., nan, 6.],
[ 7., 8., 9.]])

Array output, specified fill-in value

>>> [Link]("[Link]", delimiter=",", dtype=np.int8, filling_values=99)


array([[ 1, 2, 3],
[ 4, 99, 6],
[ 7, 8, 9]], dtype=int8)

Whitespace-delimited
[Link] can also parse whitespace-delimited data files that have missing values if
• Each field has a fixed width: Use the width as the delimiter argument.

# File with width=4. The data does not have to be justified (for example,
# the 2 in row 1), the last column can be less than width (for example, the 6
# in row 2), and no delimiting character is required (for instance 8888 and 9
# in row 3)

>>> f = open("[Link]").read() # doctest: +SKIP


>>> print(f) # doctest: +SKIP
1 2 3
44 6
7 88889

# Showing spaces as ^
>>> print([Link](" ","^")) # doctest: +SKIP
1^^^2^^^^^^3
44^^^^^^6
7^^^88889

>>> [Link]("[Link]", delimiter=4) # doctest: +SKIP


array([[1.000e+00, 2.000e+00, 3.000e+00],
[4.400e+01, nan, 6.000e+00],
[7.000e+00, 8.888e+03, 9.000e+00]])

• A special value (e.g. “x”) indicates a missing field: Use it as the missing_values argument.

>>> print(open("[Link]").read())
1 2 3
44 x 6
7 8888 9

>>> [Link]("[Link]", missing_values="x")


array([[1.000e+00, 2.000e+00, 3.000e+00],
[4.400e+01, nan, 6.000e+00],
[7.000e+00, 8.888e+03, 9.000e+00]])

• You want to skip the rows with missing values: Set invalid_raise=False.

220 9. NumPy How Tos


NumPy User Guide, Release 1.22.0

>>> print(open("[Link]").read())
1 2 3
44 6
7 888 9

>>> [Link]("[Link]", invalid_raise=False)


__main__:1: ConversionWarning: Some errors were detected !
Line #2 (got 2 columns instead of 3)
array([[ 1., 2., 3.],
[ 7., 888., 9.]])

• The delimiter whitespace character is different from the whitespace that indicates missing data. For in-
stance, if columns are delimited by \t, then missing data will be recognized if it consists of one or more spaces.

>>> f = open("[Link]").read()
>>> print(f)
1 2 3
44 6
7 888 9

# Tabs vs. spaces


>>> print([Link]("\t","^"))
1^2^3
44^ ^6
7^888^9

>>> [Link]("[Link]", delimiter="\t", missing_values=" +")


array([[ 1., 2., 3.],
[ 44., nan, 6.],
[ 7., 888., 9.]])

9.2.2 Read a file in .npy or .npz format

Choices:
• Use [Link]. It can read files generated by any of [Link], [Link], or numpy.
savez_compressed.
• Use memory mapping. See [Link].open_memmap.

9.2.3 Write to a file to be read back by NumPy

Binary

Use [Link], or to store multiple arrays [Link] or numpy.savez_compressed.


For security and portability, set allow_pickle=False unless the dtype contains Python objects, which requires
pickling.
Masked arrays can't currently be saved, nor can other arbitrary array subclasses.

9.2. Reading and writing files 221


NumPy User Guide, Release 1.22.0

Human-readable

[Link] and [Link] create binary files. To write a human-readable file, use [Link]. The
array can only be 1- or 2-dimensional, and there’s no ‘ savetxtz‘ for multiple files.

Large arrays

See Write or read large arrays.

9.2.4 Read an arbitrarily formatted binary file (“binary blob”)

Use a structured array.


Example:
The .wav file header is a 44-byte block preceding data_size bytes of the actual sound data:

chunk_id "RIFF"
chunk_size 4-byte unsigned little-endian integer
format "WAVE"
fmt_id "fmt "
fmt_size 4-byte unsigned little-endian integer
audio_fmt 2-byte unsigned little-endian integer
num_channels 2-byte unsigned little-endian integer
sample_rate 4-byte unsigned little-endian integer
byte_rate 4-byte unsigned little-endian integer
block_align 2-byte unsigned little-endian integer
bits_per_sample 2-byte unsigned little-endian integer
data_id "data"
data_size 4-byte unsigned little-endian integer

The .wav file header as a NumPy structured dtype:

wav_header_dtype = [Lin