Lab 03 Numpy - Ipynb - Colab
Lab 03 Numpy - Ipynb - Colab
ipynb - Colab
Learning Outcomes:
Import files:
Numpy Basics:
More Numpy
To analyze this data, we must first import it into our program. In Python, the most common way is by using the pandas library.
import pandas as pd
data = pd.read_csv("filename.csv")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1993608275.py in <cell line: 0>()
----> 1 data = pd.read_csv("filename.csv")
2
3 # If you want to load the excel sheet
4 # data = pd.read_excel("StudentsData.xlsx")
print(data.head())
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 1/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-2510413559.py in <cell line: 0>()
----> 1 print(data.head())
print(data.tail())
keyboard_arrow_down Task# 1
You are provided with a file named StudentsPerformance.csv, which contains information about student's performance. Your task is to load
the file into Python using pandas and perform the above five steps.
import pandas as pd
NumPy is a fundamental and a powerful Python package to efficiently practice machine learning.
Numeric Python
Alternative to Python List
A new kind of Python data type , like a float, string or list etc.
comes with own methods
Calculations over entire Sequence(array)
Easy and Super-Fast
Contain only one type i.e either an array of booleans or either an array of floats, and so on.
Installation: Execute pip3 install numpy in anaconda shell
Recap of Lists
import numpy as np
print(np_height)
print(height)
print(type(np_height),type(height))
np_weight = np.array(weight)
print(np_weight)
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 3/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
This time worked fine: the calculations were performed element-wise. The first person's BMI was calculated by dividing the first element in
np_weight by the square of the first element in np_height and so on
Show command palette (Ctrl+Shift+P)
print(bmi)
When numpy array is created with different types, the resulting Numpy array will contain a single type i.e string in this case. The boolean
and float were both converted to strings
np.array([1.0,True,'is'])
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])
[1, 2, 3, 1, 2, 3]
[2 4 6]
By square brackets
bmi
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1152952113.py in <cell line: 0>()
----> 1 bmi[1] # bmi for the second person using square brackets
By Array of booleans: If we want to get all BMI values in the bmi array that are over 23
The first step is to use the greater than sign. The result is Numpy array containing booleans: True if corresponding bmi is above 23,
False if it's below.
Next you can use this boolean array inside square brackets to do subsetting.
Result: Only the elements in bmi that are above 23, so for which the corresponding boolean value is True, is selected.
array([24.7473475])
Using the result of a comparison to make a selection of your data is a very common way to get surprising insights
keyboard_arrow_down Task 2:
A list baseball has already been defined in the following Python script, representing the height of some baseball players in cent
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 4/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Import the numpy package as np, so that you can refer to numpy with np.
Use np.array() to create a numpy array from baseball. Name this array np_baseball.
Print out the type of np_baseball to check that you got it right.
# Create
Show list
command baseball
palette (Ctrl+Shift+P)
baseball = [180, 215, 210, 210, 188, 176, 209, 200]
<class 'numpy.ndarray'>
import numpy as np
Create a numpy array from height_in. Name this new array np_height_in.
Print np_height_in.
Multiply np_height_in with 0.0254 to convert all height measurements from inches to meters. Store the new values in a new array,
np_height_m.
Print out np_height_m and check if the output makes sense.
import numpy as np
keyboard_arrow_down Task 3:
Read weight into the list weight_lb from the file numpy_baseball_weight_only.txt
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 5/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Create a numpy array from the weight_lb list with the correct units. Multiply by 0.453592 to go from pounds to kilograms. Store the
resulting numpy array as np_weight_kg.
Use np_height_m and np_weight_kg to calculate the BMI of each player. Use the following equation:
# Import numpy
import numpy as np
keyboard_arrow_down Task 4:
Create a boolean numpy array: the element of the array should be True if the corresponding baseball player's BMI is below 21. You can
use the < operator for this. Name the array light.
Print the array light.
Print out a numpy array with the BMIs of all baseball players whose BMI is below 21. Use light inside square brackets to do a selection
on the bmi array.
# height and weight are available as np_height_m and np_weight_kg (if not available, do create)
# Numpy is also imported as np
Wow! It appears that only 11 of the more than 1000 baseball players have a BMI under 21!
keyboard_arrow_down Task 5:
certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same.
# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 6/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
200
[73 74 72 73 69 72 73 75 75 73 72]
keyboard_arrow_down
Show command palette (Ctrl+Shift+P)
2D Numpy Arrays
It's perfectly possible to create 2 dimensional and even three dimensional arrays!
We can create a 2D numpy array from a regular Python list of lists
Also for 2D Array same rule applies, an array can only contain a single type
2D numpy array is an improved list of lists
We can do calculations on the arrays i.e : enables us to do element-wise calculations, the same way we did with 1D numpy arrays
Also, more advanced ways of subsetting i.e Subsetting using Boolean arrays
print(type(np_height))
print(type(np_weight))
Here numpy. tells us that it's a type that was defined in the numpy package. ndarray stands for n-dimensional aray.
# 0 1 2 3 4
np_2d = np.array ([[1.73, 1.68, 1.71, 1.89, 1.79], # 0
[65.4, 59.2, 63.6, 88.4, 68.7]]) # 1
print(np_2d) # A rectangular data structure
print(type(np_2d))
.shape attribute of Numpy Array class gives the dimension of the array: giving more information on how our data structure looks like. Note
that the syntax for accessing an attribute is a bit like calling a method but they are not the same. We put round brackets after methods when
calling them but not when calling attributes
(2, 5)
# Homogeneous 2D Arrays
np.array ([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, '68.7']])
If one float is changed to a string, all the array elements will be coerced to strings, to end up with a homogeneous array
np_2d[0]
# Don't forget about the zero indexing
np_2d[0][2]
# Selecting row 0 and then selecting column 2 for 3rd element
np.float64(1.71)
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 7/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
np_2d[0, 2] #[row, column]
# The intersection of the rows and columns you specified are returned
np.float64(1.71)
# Selecting
Show height
command palette and weight both of the second and third family member
(Ctrl+Shift+P)
np_2d[:,1:3] # 3 is exclusive here
# The intersection gives us 2D array with 2 rows and 2 columns
# Non-Intuitive method of multiple square brackets which does not allow intersection subsetting
d2_list[:][1:3]
keyboard_arrow_down Task 6:
# Import numpy
import numpy as np
<class 'numpy.ndarray'>
(2, 4)
keyboard_arrow_down Task 7:
# create baseball a 2D list from height_in and weight_lb or read from the given file
# Example
h = [180, 215, 210, 188]
w = [78.4, 102.7, 98.5, 75.2]
#use zip to create a generator function of tuples from h and w. We unpack the tuple by using two variables x and uy
baseball = [[x, y] for x, y in zip(h, w)]
print(baseball)
# Write your code for to get baseball from height_in and weight_lb
#print(baseball)
print(baseball)
distribution mean
distribution standard deviation
number of samples
# Using column_stack to paste height,weight together as two columns, argument tuple of numpy arrays
np_city = np.column_stack((height, weight))
print(height.shape)
print(np_city.shape)
(5000,)
(5000, 2)
np.float64(1.7519799999999999)
# Median Height of Players --> Height of the middle person after sorting adults from small to tall
np.median(np_city[:, 0])
np.float64(1.75)
Often, these summary statistics will provide you with a "Sanity Check" of your data. If we end up with an average weight of 20Kgs in this
case, our measurements are most likely incorrect.
array([[ 1. , -0.0143666],
[-0.0143666, 1. ]])
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 9/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
np.std(np_city[:,0])
np.float64(0.19991978291304738)
Numpy also features more basic functions such as sum and sort, which also exist in basic Python Distribution. However, the big difference
Show command palette (Ctrl+Shift+P)
here is speed. Because numpy enforces a single data type in an array, it can drastically speed up the calculations
# Numpy statistic functions that work with list having same datatype elements
x = [1, 4, 8, 10, 12]
print(np.mean(x))
print(np.median(x))
7.0
8.0
Slice in Python
The slice built-in method is used to slice a given sequence (string, bytes, tuple, list or range) or any object which supports sequence
protocol (implements getitem() and len() method).
https://www.programiz.com/python-programming/methods/built-in/slice
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1504279680.py in <cell line: 0>()
----> 1 np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]*1000
keyboard_arrow_down Task 8:
Create numpy array np_height_in that is equal to first column of np_baseball(3 cols).
Print out the mean of np_height_in.
Print out the median of np_height_in.
# np_baseball is available
# Create np_height_in from np_baseball
np_height_in = np.array(np_baseball[:,0])
print(np_height_in)
# Print out the mean of np_height_in
np_mean=np.mean(np_height_in)
print(np_mean)
# Print out the median of np_height_in
np_median=np.median(np_height_in)
print(np_median)
An average height of 1586 inches, that doesn't sound right, does it? However, the median does not seem affected by the outliers: 74 inches
makes perfect sense. It's always a good idea to check both the median and the mean, to get an idea about the overall distribution of the
entire dataset.
keyboard_arrow_down Task 9:
The code to print out the mean height is already included. Complete the code for the median height. Replace None with the correct
code.
Use np.std() on the first column of np_baseball to calculate stddev. Replace None with the correct code.
Do big players tend to be heavier? Use np.corrcoef() to store the correlation between the first and second column of np_baseball in
corr. Replace None with the correct code
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 10/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
# Print mean height (first column)
avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))
# Print out correlation between first and second column. Replace 'None'
corr = np.corrcoef(np_baseball[:,0], np_baseball[:,1])
print("Correlation: " + str(corr))
Average: 73.6896551724138
Median: 74.0
Standard Deviation: 2.312791881046546
Correlation: [[1. 0.53153932]
[0.53153932 1. ]]
You've contacted FIFA for some data and they handed you two lists. The lists are the following:
positions = ['GK', 'M', 'A', 'D', ...] heights = [191, 184, 185, 180, ...]
Each element in the lists corresponds to a player. The first list, positions, contains strings representing each player's positio
You're fairly confident that the median height of goalkeepers is higher than that of other players on the soccer field. Some of y
Convert heights and positions, which are regular lists, to numpy arrays. Call them np_heights and np_positions.
Extract all the heights of the goalkeepers. You can use a little trick here: use np_positions == 'GK' as an index for np_heights. Assign the
result to gk_heights. [If you encounter an error, use np.where() to find the indices where the value is true. indicies=
np.where(np.height>13)]
Extract all the heights of all the other players. This time use np_positions != 'GK' as an index for np_heights. Assign the result to
other_heights.
Print out the median height of the goalkeepers using np.median(). Replace None with the correct code.
Do the same for the other players. Print out their median height. Replace None with the correct code.
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 11/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipython-input-2704805964.py in <cell line: 0>()
16
17 # Heights of the goalkeepers: gk_heights
---> 18 gk_heights=np_heights[np_positions == 'GK']
Show command19
palette (Ctrl+Shift+P)
20 # Heights of the other players: other_heights
IndexError: boolean index did not match indexed array along axis 0; size of axis is 12 but size of corresponding boolean axis is
8847
Using comparison operators, generate boolean arrays that answer the following questions:
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])
[18. 20.]
[20. 10.75]
Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 12/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
import numpy as np
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])
meas = np.array([np_height, np_weight])
Show
for command
val in palette
meas (Ctrl+Shift+P)
:
print(val)
If we want to print out each element in this 2D array separately, the same basic for loop won't do the trick. The 2D array is actually
built up from an array of 1D arrays.The for loop simply prints out an entire array on each iteration.
To visit every element of an array, we can use a Numpy function called nditer() . Input is the array you want to iterate over.
# Import numpy as np
import numpy as np
# For loop over np_height
for x in np_height:
print(str(x) + " inches")
# For loop over np_baseball
for x in np.nditer(np_baseball):
print(x)
If all the data is numerical, we can use the package numpy to import the data as a numpy array
Numpy arrays are python standard for storing numerical data. They are efficient, fast and clean
Numpy arrays are often essential for other packages e.g scikit-learn
Numpy itself has a number of built-in functions that make it far easier and more effecient for us to import data as arrays.
import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',') #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.
The default delimeter is white space, so we will usually need to specify it explicitly
# If your data is numeric and header has strings, we use skiprows = 1 to skip header row
import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',', skiprows=1) #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.
# For using only columns 0 and 2, we write usecols = [0,2] as 4th loadtxt argument
import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols = [0,2]) #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.
# We can also import different data type into Numpy arrays. We set dtype = str to import all members as string type
data = np.loadtxt(filename, delimiter=',', dtype=str)
data
loadtxt breaks down when we have mixed datatypes: when one column of float and other of string datatype
Natural place for mixed datatypes is pandas dataframes, not Numpy. Although, Numpy can handle mixed datatypes
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 13/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Use np.loadtxt() by passing file and a comma ',' for the delimiter.
Print the type of object digits
Execute the rest of the code to visualize one of the rows of the data.
<class 'numpy.ndarray'>
(100, 785)
(784,)
(28, 28)
You can use '\t' for tab-delimited. skiprows allows you to specify how many rows (not indices) you wish to skip. usecols takes a list of the
indices of the columns you wish to keep.
data = pd.read_csv(
"file_name.txt",
delimiter="\t",
skiprows=2, # skip the first 2 metadata rows
usecols=[0, 2] # only load Name and City
)
print(data)
The file seaslug.txt has a text header, consisting of strings and is tab-delimited. These data consists of percentage of sea slug
Use np.loadtxt() by passing file as the first argument, file is tab delimeted and import the elements as string datatype.
print the first element of data.
Use np.loadtxt() again. This time The file you're importing is tab-delimited, the datatype is float, and you want to skip the first row.
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 14/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Print the 10th element of data_float. Be guided by the previous print() call.
Execute the rest of the code to visualize the data.
['Time' 'Percent']
[0. 0.357]
Due to the header, if you tried to import it as-is using np.loadtxt(), Python would throw you a ValueError and tell you that it could not
convert string to float. There are two ways to deal with this: firstly, you can set the data type argument dtype equal to str (for string).
Alternatively, you can skip the first row as we have seen before, using the skiprows argument.
More Numpy
keyboard_arrow_down
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 15/15