0% found this document useful (0 votes)
10 views150 pages

BA 04 Python NumPy Basics

Uploaded by

Nguyen Van Kien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views150 pages

BA 04 Python NumPy Basics

Uploaded by

Nguyen Van Kien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

Business Analytics

NumPy Basics

Phan Xuân Hiếu • VNU-UET • [email protected]


Lecture objectives

§ Introduce the NumPy library and explain why NumPy is important to data analytics.
§ Introduce and describe the NumPy multidimensional arrays (ndarray) with
– Different ways to create arrays,
– The standard data types,
– Different array indexing and slicing methods, and
– Various operations on arrays including arithmetic, linear algebra, aggregate, etc,
– Ways to reshape, concatenate, and split arrays.

§ Introduce various universal functions working on NumPy arrays.


§ Introduce how to save and load NumPy arrays to and from files.

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 2


Contents

1. What Is NumPy?
2. How Python and NumPy Store Data Objects
3. The NumPy ndarray: A Multidimensional Array Object
4. Pseudorandom Number Generation
5. Universal Functions: Fast Element-Wise Array Functions
6. Array-Oriented Programming with Arrays
7. File Input and Output with Arrays

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 3


What Is NumPy?
– What Is NumPy?
– Why Is NumPy?

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 4
What is NumPy?
§ Created in 2005 by Travis Oliphant.
– It is an open source project and you can use it freely.

§ NumPy (short for Numerical Python):


– One of the most important foundational package for numerical computing in Python.

§ Many important libraries built on top of or with NumPy:


– SciPy, pandas, matplotlib, seaborn, scikit-learn, TensorFlow, PyTorch ...

§ The main element of NumPy is (multidimensional) arrays.


– Fixed-typed arrays (homogeneous).
– Efficient memory useage and fast operations on arrays.

§ Also provides functions in linear algebra, fourier transform, random ...

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 5


Technically, ... things you will find in NumPy

§ ndarray, an efficient multidimensional array providing


– fast array-oriented arithmetic operations and flexible broadcasting capabilities

§ Mathematical functions for fast operations on entire arrays of data


– without having to write loops

§ Tools for reading and writing array data to disk


– and working with memory-mapped files.

§ Linear algebra, random number generation, and Fourier transform capabilities

§ C APIs for connecting NumPy with libraries written in C, C++, FORTRAN

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 6


Why is NumPy?
§ Understanding NumPy:
– working with arrays and other libraries like pandas, scikit-learn more effectively
– a lot of NumPy operations can be applied in data analysis applications.

§ We will focus on:


– Fast array-based operations for data munging, cleaning, transformation ...
– Common array algorithms like sorting, unique, and set operations ...
– Efficient descriptive statistics and aggregation / summarizing data
– Data alignment and relational data manipulations for merging, joining data ...
– Expressing conditional logic as array expressions instead of loops
– Group-wise data manipulations ...

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 7


How Python and NumPy Store Data Objects
– A Python Integer is More Than Just an Integer
– A Python List Is More Than Just a List
– Fixed-Type Arrays in Python
– NumPy Arrays versus Python Lists

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 8
Why need to understand data types in Python and NumPy?

§ Data science requires understanding how data is stored and manipulated.


§ Constrats:
– How arrays of data are handled in the Python language itself, and
– How NumPy improves on this.

§ Python is a dynamically typed language


– (vs. statically typed languages like C or Java)
– Python’s type flexibility → requires more memory to store variables ...

§ And how NumPy organize and store data to be fast and efficient?

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 9
A Python integer is more than just an integer

A single integer in Python 3.10 (C structure)


§ ob_refcnt: a reference count helping Python silently
handle memory allocation and deallocation
§ ob_type: the pointer to the Python type object that
defines behavior (should be <class ‘int’>)
§ ob_size: the number of digits stored in ob_digit
§ ob_digit: the actual integer value, can grow as needed.
Python integer is not limited by CPU bit size (unlike C
int). It is arbitrary precision.

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 10


C integer versus Python integer

§ Some overhead in storing integer in Python


§ PyObject_HEAD: the extra data (e.g.,
reference count, object type ...)
§ A Python integer is a pointer to a position in
memory containing all the Python object
information as explained.
§ Dynamic typing in Python comes at a cost!

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 11


A Python list is more than just a list

§ A Python list can contain objects of different types (i.e., heterogeneous)


§ Each item in the list must contain its own type, reference count ...
§ Each item is a complete Python object.
§ If all elements of the list are of the same type → much redundancy!
§ Much more efficient to store the data in a fixed-type array.

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 12


Fixed-type arrays in Python

§ The built-in array module.


§ Array of a uniform type.
§ ‘i’ is a type code → integer.
§ However:
• NumPy arrays are more
efficient (both for storing
and efficient operations)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 13


NumPy arrays vs. Python lists
§ Python list:
• Heterogeneous (objects of different
types) → flexibility but pay a cost!
• Pointer → a block of pointers →
pointing to full Python objects (like
Python integer as described)
§ NumPy array:
• Fixed-type, i.e., homogeneous (all
elements are of the same type)
• Not flexible as Python list but
efficient for storing and
manipulating data.

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 14


The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 15
The NumPy ndarray
§ One of the key features of NumPy:
– its n-dimensional array object, or ndarray.
– ndarray is a fast, flexible container for large datasets in Python.
– ndarrays allow math operations on whole blocks of data ...

§ Import of NumPy and creating a simple 2-dimensional array (i.e., a matrix)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 16


narray – multidimensional container for homogeneous data
§ Homogeneous: all elements must be the same type.
§ Every array has a ndim: an integer indicating the number of dimensions
§ Every array has a shape: a tuple indicating the size of each dimension.
§ A dtype: an object describing the data type of the array.

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 17


The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 18
Creating NumPy arrays
§ Creating 0-dimensional arrays
§ Creating arrays from Python sequences (lists, tuples ...)
§ Creating arrays with default values
§ Creating arrays with functions arange() or linspace()
§ Creating arrays with random values (uniform / normal distributions)
§ Creating an uninitialized array using np.empty()
§ Creating multidimensional arrays using reshape()
§ Creating 1-dimensional arrays using reshape(-1)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 19


Creating 0-dimensional arrays (vs. 1-dimensional arrays)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 20


Creating arrays from Python sequences (lists, tuples ...)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 21
Check and specify data type when creating arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 22
Creating arrays with default values

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 23
Creating arrays with arange() and linspace()

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 24
Some important NumPy array creation functions

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 25


Creating arrays of random values in [0, 1)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 26
Creating arrays of random integers in a given range

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 27
Creating arrays of values from normal distributions

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 28
Creating arrays of values from normal distributions (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 29
Creating an uninitialized array using np.empty()

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 30
Creating multidimensional arrays using reshape()

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 31
Flattening 1-dimensional arrays using reshape(-1)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 32
The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 33
NumPy standard data types
§ NumPy arrays contain values of a single type.
§ NumPy is built in C
– The types will be familiar to users of C, Fortran, and other related languages.
§ Can specify the data type when creating a NumPy array

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 34


20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 35
Type conversion or casting for arrays

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 36


Type conversion or casting for arrays (2)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 37


The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 38
Array indexing and accessing elements

§ Accessing single elements


§ Indexing errors
§ Updating elements

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 39
Accessing single elements in 1-d arrays (i.e., vectors)

0 1 2 3 4

arr1d = 2 8 3 3 0

-5 -4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 40
Accessing single elements in 1-d arrays with negative indices

0 1 2 3 4

arr1d = 2 8 3 3 0

-5 -4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 41
Accessing single elements in 2-d arrays (i.e., matrices)

arr2d =

Axis 1
0 1 2 3

0 18 12 3 19 -3
Axis 0

1 12 0 5 10 -2

2 14 1 8 5 -1

-4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 42
Accessing single elements in 2-d arrays with negative indices

arr2d =

Axis 1
0 1 2 3

0 18 12 3 19 -3
Axis 0

1 12 0 5 10 -2

2 14 1 8 5 -1

-4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 43
Accessing single elements in 3-d arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 44
Indexing errors in 1-d arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 45
Indexing errors in 2-d arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 46
Updating elements in arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 47
The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 48
Slicing subarrays from one-dimensional arrays

§ The slicing syntax:


array[start:stop:step]
– start is the the first index (included)
– stop is the last index (excluded)
– |step| > 1 if want to skip some elements at each step
§ The default values:
– start = 0
– stop = <size of dimension>
– step = 1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 49
Slicing from one-dimensional arrays

0 1 2 3 4

arr1d = 8 6 7 5 9

-5 -4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 50
Slicing from one-dimensional arrays with negative indices

0 1 2 3 4

arr1d = 8 6 7 5 9

-5 -4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 51
Slicing subarrays from multidimensional arrays

§ The slicing syntax (k ≥ 2):


array[start1:stop1:step1, ..., startk:stopk:stepk]
– starti is the the first index (included)
– stopi is the last index (excluded)
– |stepi| > 1 if want to skip some elements at each step
§ Default values:
– starti = 0
– stopi = <size of dimension i>
– stepi = 1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 52
Slicing subarrays from 2-d arrays

arr2d =

Axis 1
0 1 2 3

0 18 12 3 19 -3
Axis 0

1 12 0 5 10 -2

2 14 1 8 5 -1

-4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 53
Different ways of slicing and their shapes

arr2d = Slicing expression Shape


Axis 1
arr2d[1] (4,)
0 1 2 3
arr2d[1, :] (4,)
0 18 12 3 19 -3
Axis 0

arr2d[-2, :] (4,)
1 12 0 5 10 -2
arr2d[1:2, :] (1, 4)
2 14 1 8 5 -1
arr2d[-2:-1, :] (1, 4)
-4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 54
Different ways of slicing and their shapes (2)

arr2d = Slicing expression Shape


Axis 1
arr2d[2, 1:3] (2,)
0 1 2 3
arr2d[2, 1:-1] (2,)
0 18 12 3 19 -3
Axis 0

arr2d[-1, -3:-1] (2,)


1 12 0 5 10 -2
arr2d[2:, 1:3] (1, 2)
2 14 1 8 5 -1
arr2d[-1:, 1:-1] (1, 2)
-4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 55
Different ways of slicing and their shapes (3)

arr2d =

Axis 1
0 1 2 3 Slicing expression Shape

0 18 12 3 19 -3 arr2d[:, :2] (3, 2)


Axis 0

1 12 0 5 10 -2 arr2d[:3, :2] (3, 2)

2 14 1 8 5 -1 arr2d[:, :-2] (3, 2)

-4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 56
Different ways of slicing and their shapes (4)

arr2d =
Slicing expression Shape
Axis 1
0 1 2 3 arr2d[:2, 2:] (2, 2)

0 18 12 3 19 -3 arr2d[:2, -2:] (2, 2)


Axis 0

1 12 0 5 10 -2 arr2d[:-1, -2:] (2, 2)

2 14 1 8 5 -1 arr2d[0:2, 2:4] (2, 2)


-4 -3 -2 -1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 57
Slicing subarrays from 3-d arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 58
Slicing subarrays from 3-d arrays (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 59
Subarrays as no-copy views

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 60
Subarrays as no-copy views (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 61
Creating copies of subarrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 62
The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 63
Indexing rows using integer arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 64
Indexing columns using integer arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 65
Indexing particular elements, e.g., (1, 0), (5, 3), (7, 1), (2, 2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 66
Selecting a subset of the matrix’s rows and columns

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 67
Modifying the original matrix with fancy indexing

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 68
The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 69
Arithmetic operations (element-wise)
§ Addition (+)
§ Subtraction (-)
§ Multiplication (*)
§ Division (/)
§ Floor division (//)
§ Modulus (%)
§ Exponentiation (**)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 70
Arithmetic operations on two vectors (element-wise)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 71
Arithmetic operations on two matrices (element-wise)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 72
The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 73
Linear algebra operations
§ Matrix multiplication: A @ B or np.dot(A, B)
§ Transpose of a matrix: A.T or np.transpose(A)
§ Inverse of a matrix: np.linalg.inv(A)
§ Determinant of a matrix: np.linalg.det(A)
§ Solve linear equations: np.linalg.solve(A, b)
§ Eigenvalues and eigenvectors: np.linalg.eig(A)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 74
Examples of linear algebra operations

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 75
Examples of linear algebra operations (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 76
Examples of linear algebra operations (3)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 77
Examples of linear algebra operations (4)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 78
Commonly used numpy.linalg functions

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 79
The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 80
Aggregate functions (reduction operations)
§ Sum of elements: np.sum(A) § Minimum value: np.min(A)
§ Mean (average): np.mean(A) § Maximum value: np.max(A)
§ Median: np.median(A) § Product of all elements: np.prod(A)
§ Standard deviation: np.std(A) § Cumulative sum: np.cumsum(A)
§ Variance: np.var(A) § Cumulative product: np.cumprod(A)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 81
Examples of aggregate functions (reduction operations)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 82
Examples of aggregate functions (reduction operations) (2)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 83


Examples of aggregate functions (reduction operations) (3)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 84


The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 85
Shape manipulation operations

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 86
Reshaping 1D arrays to 2D, 3D arrays ...

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 87
Converting 2D arrays to 1D arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 88
Converting 3D arrays to 1D arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 89
Reshape with -1 (automatically calculate)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 90
Reshape with -1 (automatically calculate) (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 91
Adding a new axis (expanding dimensions)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 92
Removing single-dimension axes (squeeze)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 93
The NumPy ndarray: A Multidimensional Array Object
– Creating NumPy Arrays – Arithmetic Operations
– NumPy Standard Data Types – Linear Algebra Operations
– Array Indexing and Accessing – Aggregate Functions
– Slicing Subarrays – Shape Manipulation Operations
– Fancy Indexing – Array Concatenation and Splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 94
Concatenation of arrays
§ Concatenation: np.concatenate()
§ Vertical stacking: np.vstack()
§ Horizontal stacking: np.hstack()
§ Column-wise stacking: np.column_stack()
§ Row-wise stacking: np.row_stack()

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 95
Array concatenation

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 96
Vertical and horizontal stacking

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 97
Column-wise and row-wise stacking

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 98
Splitting of arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 99
Vertical and horizontal splitting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 100
Pseudorandom Number Generation
– Generation from Uniform Distribution
– Generation from Normal Distributions
– Generation from Binomial Distributions
– Generating Random Permutations of Elements
– Generation from Other Distributions

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 101
Random generation from uniform distribution

§ Generating random integers from uniform distribution


§ Generating random real numbers in [0, 1) from uniform distribution
§ Generating random numbers from an array
– With uniform distribution over the array’s elements or there are weights
– With replacement (replace=True), and
– Without replacement (replace=False)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 102
Generating random integers from uniform distribution

§ Generating random integers


from uniform distribution in
the range [low, high), e.g.,
• [0, 100)
§ size=shape, e.g.,
• size=10
• size=(3, 5)
• size=(2, 2, 4)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 103
Generating random numbers in [0, 1) from uniform dist.

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 104
Generating random numbers from an array

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 105
Generating random numbers from an array (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 106
Random generation from normal distributions

§ The normal distribution is one of the most important distributions.


– Also called the Gaussian distribution (after Carl Friedrich Gauss)
– It fits the probability distribution of many real-world events, e.g., IQ scores, heights ...

§ Three ways to generate numbers from the normal distribution:


– numpy.random.randn(d0, d1, ...)
• Standard normal distribution with shape information
– numpy.random.standard_normal(size=(d0, d1, ...))
• Standard normal distribution with shape information (in a tuple)
– numpy.random.normal(loc=0.0, scale=1.0, size=(d0, d1, ....))
• Normal distribution with mean, standard deviation, and array shape (default = None)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 107
Random generation from the standard normal distribution

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 108
Generation from normal distribution (mean=3.0, std=2.0)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 109
Plotting the sample histogram from the normal distribution

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 110
Random generation from the binomial distribution

§ The binomial distribution models the number of successes in a fixed


number of independent trials, where each trial has only two possible
outcomes: success or failure.
– The binomial distribution is a discrete distribution.
– It describes the outcome of binary scenarios, e.g., toss of a coin ...

§ It has two parameters:


– 𝑛 is the total number of independent trials.
– 𝑝 is the probability of success in each trial.

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 111
Relationship between binomial and normal distributions

§ The binomial distribution approaches a normal distribution when:


– 𝑛 (the number of trials) is large.
– 𝑝 (the probability of success) is not too close to 0 or 1.
– Approximation is valid when 𝑛𝑝 ≥ 5 and 𝑛(1 − 𝑝) ≥ 5.

§ The normal distribution approximation to the binomial distribution:


– Mean: 𝜇 = 𝑛𝑝
– Variance: 𝜎 ! = 𝑛𝑝(1 − 𝑝)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 112
Example of generation from the binomial distribution

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 113
Plotting binomial sample distribution (n=10, p=0.2)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 114
When the binomial approaching normal (n=100, p=0.5)

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 115
Generating random permutations of elements

§ Shuffing arrays (in-place shuffle)


§ Generating permutations of the elements of an array (return a copy)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 116
Shuffing the elements of an array

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 117
Generating permutations of the elements of an array

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 118
Universal Functions: Fast Element-Wise Array Functions
– What Are Universal Functions?
– Create Your Own Ufunc
– Other NumPy Ufuncs

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 119
What are universal functions?

§ A universal function, or ufunc, is a function that


– perform element-wise operations on data in ndarrays.

§ Ufuncs are used to implement vectorization in NumPy


– which is way faster than iterating over elements.

§ Ufuncs also take additional arguments, like:


– where: boolean array or condition defining where the operations should take place.
– dtype: defining the return type of elements.
– out: output array where the return value should be copied.

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 120
How do ufuncs work?

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 121
Creating your own ufuncs

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 122
Ufuncs for simple arithmetic operations
§ Addition: np.add()
§ Subtraction: np.subtract()
§ Multiplication: np.multiply()
§ Division: np.divide()
§ Power: np.power()
§ Remainder: np.mod() or np.remainder()
§ Quotient and mod: np.divmod()
§ Absolute values: np.absolute()

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 123
Examples of arithmetic ufunctions

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 124
Examples of arithmetic ufunctions (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 125
Ufuncs for rounding decimals
§ Truncation: np.trunc() or np.fix()
§ Rounding: np.round()
§ Floor: np.floor()
§ Ceiling: np.ceil()

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 126
Examples of rounding decimal operations

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 127
Ufuncs for logs
§ Log at base 2: np.log2()
§ Log at base 10: np.log10()
§ Natural log or log at base e: np.log()
§ Log at any base: creating your own ufunc

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 128
Some unary ufuncs

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 129
Some
binary
ufuncs

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 130
Array-Oriented Programming with Arrays
– Expressing Conditional Logic as Array Operations
– Mathematical and Statistical Methods
– Methods for Boolean Arrays
– Sorting
– Unique and Other Set Logic

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 131
Expressing conditional logic as array operations

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 132
Expressing conditional logic as array operations using where

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 133
Conditional logic on 2D arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 134
Conditional logic on 2D arrays (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 135
Mathematical and statistical methods

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 136
Methods for boolean arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 137
Two methods: ndarray.any()and ndarray.all()

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 138
Sorting

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 139
Sorting multidimensional arrays using axis=0

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 140
Sorting multidimensional arrays using axis=1

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 141
Sorting using numpy.sort (creating a copy)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 142
Unique and other set logic

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 143
Unique and other set logic (2)

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 144
Unique and other set logic: other methods

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 145
File Input and Output with Arrays

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 146
Saving and loading NumPy arrays to/from binary files

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 147
Saving multiple arrays to a file

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 148
Lecture summary

§ Introduced the NumPy library and explained why NumPy is important to data analytics.
§ Introduced and describe the NumPy multidimensional arrays (ndarray) with
– Different ways to create arrays,
– The standard data types,
– Different array indexing and slicing methods, and
– Various operations on arrays including arithmetic, linear algebra, aggregate, etc,
– Ways to reshape, concatenate, and split arrays.

§ Introduced various universal functions working on NumPy arrays.


§ Introduced how to save and load NumPy arrays to and from files.

March 20, 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 149
Lecture materials

§ Allen B. Downey. Think Python: How to Think Like a Computer Scientist. O’Reilly Media
Inc., 2024.
§ Luciano Ramalho. Fluent Python, O’Reilly Media Inc., 2015.
§ Brett Slatkin. Effective Python: 125 Specific Ways to Write Better Python, Pearson
Education, Inc., 2025.
§ Wes McKinney. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Jupyter,
O’Reilly Media Inc., 2022.
§ Guido van Rossum, Barry Warsaw, and Alyssa Coghlan. Python Enhancement Proposal 8
(PEP 8) – Style Guide for Python Code, 2001. URL: https://peps.python.org/pep-0008/

20 March 2025 NumPy Basics • Phan Xuân Hiếu • VNU-UET • [email protected] 150

You might also like