IDAT7215
Computer Programming for
Product Development and
Applications
Lecture 2-1: Python Libraires:
NumPy
Dr. Zulfiqar Ali
Outline
▪ NumPy Introduction
▪ Creation of Arrays
▪ Data Retrieve and Restructure
▪ Fast Computation with NumPy
NumPy
▪ NumPy stands for Numerical Python and it is the fundamental package for
scientific computing with Python.
▪ NumPy is a Python library for handling multi-dimensional arrays.
▪ It contains both the data structures needed for the storing and accessing
arrays, and operations and functions for computation using these arrays.
▪ Unlike lists, the arrays must have the same data types for all its elements.
▪ The homogeneity of arrays allows highly optimized functions that use arrays
as their inputs and outputs.
Usages of high-dimensional arrays in data analysis
▪ Store matrices, solve systems of linear equations, compute
eigenvalues/eigenvectors, matrix decompositions, …
▪ Images and videos can be represented as NumPy arrays
Usages of high-dimensional arrays in data analysis
▪ Store matrices, solve systems of linear equations, compute
eigenvalues/eigenvectors, matrix decompositions, …
▪ Images and videos can be represented as NumPy arrays
▪ A 2-dimensional table might store an input data matrix in data analysis,
where row represents a sample, column represents a feature (Commonly
used in scikit-learn).
Creation of arrays
▪ Import the NumPy library
– Suggested to use the standard
abbreviation np
Creation of arrays
▪ Import the NumPy library
– Suggested to use the standard
abbreviation np
▪ Give a (nested) list as a
parameter to the array
constructor
– One dimensional array: list
– Two dimensional array: list of
lists
– Three dimensional array: list of
lists of list
One dimensional array, Two dimensional array, Three
dimensional array
▪ In a two-dimensional array, you have rows and columns. The rows are indicated
as “axis 0” while the columns are the “axis 1”.
▪ The number of the axis goes up accordingly with the number of the dimensions.
Creation of arrays
▪ Useful function to create common types of
arrays
– np.zeros(): all elements are 0s
– np.ones(): all elements are 1s
– np.full(): all elements to a specific value
– np.empty(): all elements are uninitialized
– np.eye(): identity matrix: a matrix with
elements on the diagonal are 1s, others are 0s
Creation of arrays
▪ Generate evenly spaced values within a given interval.
▪ np.arrange(): It works like Python built-in range() function
▪ For non-integer ranges it is better to use np.linspace().
▪ With np.linspace() one does not have to compute the length of the step, but
instead one specifies the wanted number of elements. By default, the endpoint is
included in the result, unlike with arange.
Creation of arrays with random elements
▪ We may need some random generated data to test our program
▪ NumPy can easily produce arrays of wanted shape with random numbers.
▪ np.random.random(): uniformly distributed from [0.0, 1.0)
▪ np.random.normal(): normally distributed
▪ np.random.randint(): uniformly distributed integers
Creation of arrays with random elements
Creation of arrays with random elements
▪ To debug our code, sometimes it is useful to re-create exactly the same
random data in every run of our program.
▪ We can create random numbers deterministically using seed.
If you run the code multiple times,
it will always give the same
numbers,
Array types and attributes
▪ An array has several attributes:
– ndim: the number of dimensions
– shape: size in each dimension
– size: the number of elements
– dtype: the type of element
Indexing
▪ One dimensional array works like
the list.
▪ For multi-dimensional array, the
index is a comma separated tuple
instead of single integer
▪ Note that if you give only a single
index to a multi-dimensional array,
it indexes the first dimension of
the array.
Slicing
▪ Slicing works similarly to lists, but now
we can have slices in different
dimensions.
▪ We can even assign to a slice
▪ Extract rows or columns from an array
Reshaping
▪ When an array is reshaped, its number of elements stays at the same, but
they are reinterpreted into a different shape.
▪ E.g., one dimensional array into two dimension array
Combining Arrays
▪ Combining several arrays into on bigger array:
concatenate and stack
▪ Concatenate: It takes n-dimensional arrays
and return an n-dimensional array.
▪ Stack: it takes n-dimensional arrays and return
(n+1)-dimensional array
Concatenate
▪ By default, concatenate joins the arrays
along axis 0.
▪ To joint array horizontally, add parameter
axis = 1
Concatenate different dimensions
▪ If you want to concatenate arrays with different dimensions, you must first
reshape the arrays to have the same number of dimensions.
– E.g, add a new row (column) to a 2d array
Stack
▪ Use stack to create higher dimensional arrays from lower dimensional
arrays:
Split
▪ split is the inverse operation of
concatenate.
▪ The input argument to split can be
the number of equal parts the arrays is
divided into.
Split
▪ The input argument to split can also
be indices that specified explicitly the
break points.
▪ The entries indicate where along axis
(default axis = 0) the array is split.
▪ E.g., np.split(d, (2, 3, 5)) split
array d into
– d[:2]
– d[2:3]
– d[3:5]
– d[5:]
Less Memory in NumPy
▪ Space occupied by
NumPy is less compare
to list.
Fast computation on arrays
▪ In addition to providing a way to store and access multi-dimension arrays,
NumPy also provides several routines to perform computations on them.
▪ One of the reasons for the popularity of NumPy is that these computations
can be very efficient, much more efficient than what Python can normally do.
▪ The biggest bottle-necks in efficiency are the loops, which can be iterated
millions, billions, or even more times.
▪ What slows down loops in Python is the fact that Python is dynamically typed
language: at each expression Python has to find out the types of the
arguments of the operations.
Fast computation examples
▪ Let consider multiply 2 to a collection of numbers.
▪ At each iteration of the loop, Python has find out the
type of the variable x, which can in this example be an
int, a float or a string, and depending on this type call a
different function to perform the “multiplication” by two.
▪ What makes NumPy efficient, is the requirement that
each element in an array must be of the same type.
This homogeneity of arrays makes it possible to
create vectorized operation, which don’t operate on
single elements, but on arrays (or subarrays).
Fast computation examples
▪ Because each iteration in NumPy is using identical operations only the data
differs, this can be compiled into machine language, and then performed in
one go, hence avoiding Python’s dynamic typing.
▪ The name vector operation comes from linear algebra
– addition of two vectors a = [𝑎1, 𝑎2], b = [𝑏1, 𝑏2] is element-wise addition
a + b = [𝑎1 + 𝑏1, 𝑎2 + 𝑏2]
Arithmetic Operations in NumPy
▪ The basic arithmetic operations in
NumPy are defined in the vector form.
– +: addition
– -: subtraction
– *:multiplication
– /:division
– //:floor division
– **: power
– %: remainder
Aggregations: max, min, sum, mean, standard deviations…
▪ Aggregations allow us to describe the information in an array by using few
numbers
Aggregation over certain axes
▪ Instead of aggregating over the whole array,
we can aggregate over certain axes only as
well.
Python function, NumPy function, NumPy Method
▪ Most of the aggregation functions in
NumPy have corresponding methods.
▪ Python language has builtin functions
sum, min, max, etc.
▪ Do not accidentally use Python built-in
functions for arrays, since they will be
significantly slower than NumPy’s
functions and methods.
Efficiency of NumPy functions
▪ The speed of NumPy partly comes from the fact that its arrays must have same
type for all the elements. This requirement allows some efficient optimizations.
Broadcasting
▪ We have seen that NumPy allows array operations that are performed
element-wise.
▪ NumPy also allows binary operation that do not require the two arrays to
have the same shape.
▪ E.g., add 4 to all elements of an array
Broadcasting
▪ How binary operation is performed?
▪ NumPy tries to stretch the arrays to have the same shape, then perform the
element-wise operation.
▪ In NumPy, this stretching is called broadcasting.
NumPy first stretched the scalar 4 to the array
np.array([4,4,4]) and then performed the element-
wise addition.
Broadcasting
▪ In this example the second argument b was first
broadcasted to the array
▪ Then the addition was performed.
Comparisons and Masking
▪ Just like NumPy allows element-wise arithmetic operations
between arrays, it is also possible to compare two arrays
element-wise.
▪ We can also count the number of comparisons that were
True. This solution relies on the interpretation that True
corresponds to 1 and False corresponds to 0.
▪ Broadcasting rules also apply to comparison
Masking
▪ Another use of Boolean arrays is that they
can be used to select a subset of
elements. It is called masking.
▪ It can also be used to assign a new value.
For example, the following zeroes out the
negative numbers.
Fancy Indexing
▪ Using indexing we can get a single elements from an
array. If we wanted multiple (not necessarily
contiguous) elements, we would have to index
several times.
▪ That’s quite verbose. Fancy indexing provides a
concise syntax for accessing multiple elements.
▪ We can also assign to multiple elements through
fancy indexing.
Fancy Indexing
▪ Fancy indexing works also for higher dimensional
arrays.
▪ We can also combine normal indexing, slicing and
fancy indexing.
Sorting arrays
▪ Sorting one dimensional array is similar
to sort a list.
▪ We can also sort a high dimensional
array along different axes.
Sorting arrays
▪ A related operation is the argsort function. Which doesn’t sort the elements
but returns the indices of the sorted elements.
These indices [3, 0, 4, 1, 2] say that the smallest element of the array is in position 3 of a,
second smallest elements is in position 0 of a, third smallest is in position 4, and so on.
Matrix operations
▪ NumPy support a wide variety of matrix operations, such as matrix
multiplication, solve systems of linear equations, compute
eigenvalues/eigenvectors, matrix decompositions and other linear algebra
related operations.
Applications of NumPy
▪ Mathematics (MATLAB Replacement)
▪ Plotting (Matplotlib)
▪ Backend (Pandas, Digital Photography)
▪ Machine Learning
▪ Signal Processing