0% found this document useful (0 votes)
118 views15 pages

Lab 03 Numpy - Ipynb - Colab

Uploaded by

i222162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views15 pages

Lab 03 Numpy - Ipynb - Colab

Uploaded by

i222162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

9/1/25, 11:25 AM Lab_03_Numpy.

ipynb - Colab

Name# Asma khan

Roll No# 22i-2162


Show command palette (Ctrl+Shift+P)

Lab 3: Numpy and Vectorized Computation

Learning Outcomes:

Import files:

How to import csv files

Numpy Basics:

Importing Files Using Numpy

More Numpy

keyboard_arrow_down Import Files


A CSV file is a plain text file that stores data in tabular format, where each row corresponds to a record. Each column is separated by a
comma , (or sometimes other delimiters like ; or \t).

Example of a CSV file:

Name, Age, City

Ali, 21, Lahore

Sara, 23, Karachi

Ahmed, 22, Islamabad

To analyze this data, we must first import it into our program. In Python, the most common way is by using the pandas library.

keyboard_arrow_down 1. Import pandas library in Python

import pandas as pd

keyboard_arrow_down 2. Load CSV file:

data = pd.read_csv("filename.csv")

# If you want to load the excel sheet


# data = pd.read_excel("StudentsData.xlsx")

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1993608275.py in <cell line: 0>()
----> 1 data = pd.read_csv("filename.csv")
2
3 # If you want to load the excel sheet
4 # data = pd.read_excel("StudentsData.xlsx")

NameError: name 'pd' is not defined

Next steps: Explain error

keyboard_arrow_down 3. View the first few rows of data:

print(data.head())

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 1/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-2510413559.py in <cell line: 0>()
----> 1 print(data.head())

NameError: name 'data' is not defined


Show command palette (Ctrl+Shift+P)

Next steps: Explain error

keyboard_arrow_down 4. View the last few rows of data:

print(data.tail())

keyboard_arrow_down 5. Access specific column:

print(data["Name"]) # Display only the 'Name' column

keyboard_arrow_down Task# 1

You are provided with a file named StudentsPerformance.csv, which contains information about student's performance. Your task is to load
the file into Python using pandas and perform the above five steps.

import pandas as pd

# Step 1: Load dataset


data = pd.read_csv("StudentsPerformance.csv")

# Step 2: View the first few rows of data


print(data.head())

# Step 2: Display first 10 rows


print(data.head(10))

# Step 3: Display last 10 rows


print(data.tail(10))

# Step 4: Access the test preparation column from the file


print(data["test preparation course"])

gender race/ethnicity parental level of education lunch \


0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard
5 female group B associate's degree standard
6 female group B some college standard
7 male group B some college free/reduced
8 male group D high school free/reduced
9 female group B high school free/reduced

test preparation course math score reading score writing score


0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75
5 none 71 83 78
6 completed 88 95 92
7 none 40 43 39
8 completed 64 64 67

test preparation course math score reading score writing score


https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 2/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
test preparation course math score reading score writing score
990 completed 86 81 75
991 completed 65 82 78
992 none 55 76 76
993 none 62 72 74
994 none 63 63 62
995 completed 88 99 95
Show command palette (Ctrl+Shift+P)
996 none 62 55 55
997 completed 59 71 65
998 completed 68 78 77
999 none 77 86 86
0 none
1 completed
2 none
3 none
4 none
...
995 completed
996 none
997 completed
998 completed
999 none
Name: test preparation course, Length: 1000, dtype: object

keyboard_arrow_down Numpy Basics

NumPy is a fundamental and a powerful Python package to efficiently practice machine learning.

Numeric Python
Alternative to Python List
A new kind of Python data type , like a float, string or list etc.
comes with own methods
Calculations over entire Sequence(array)
Easy and Super-Fast
Contain only one type i.e either an array of booleans or either an array of floats, and so on.
Installation: Execute pip3 install numpy in anaconda shell

Recap of Lists

Can Hold Different types(Any)


Change, add, remove elements

height = [1.73, 1.68, 1.71, 1.89, 1.79]


print(height)

[1.73, 1.68, 1.71, 1.89, 1.79]

weight = [65.4, 59.2, 63.6, 88.4, 68.7]


print(weight)
weight1=[0,0,0,0,0]
for x in range(0,len(weight)):
weight1[x]=weight[x]/10
print(weight1)

[65.4, 59.2, 63.6, 88.4, 68.7]


[6.540000000000001, 5.92, 6.36, 8.84, 6.87]

keyboard_arrow_down Creating Numpy Array

import numpy as np

np_height = np.array(height) # Input = list Output = Numpy Array

print(np_height)
print(height)

print(type(np_height),type(height))

[1.73 1.68 1.71 1.89 1.79]


[1.73, 1.68, 1.71, 1.89, 1.79]
<class 'numpy.ndarray'> <class 'list'>

np_weight = np.array(weight)

print(np_weight)

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 3/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab

[65.4 59.2 63.6 88.4 68.7]

This time worked fine: the calculations were performed element-wise. The first person's BMI was calculated by dividing the first element in
np_weight by the square of the first element in np_height and so on
Show command palette (Ctrl+Shift+P)

bmi = np_weight / np_height**2

print(bmi)

[21.85171573 20.97505669 21.75028214 24.7473475 21.44127836]

When numpy array is created with different types, the resulting Numpy array will contain a single type i.e string in this case. The boolean
and float were both converted to strings

np.array([1.0,True,'is'])

array(['1.0', 'True', 'is'], dtype='<U32')

python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

print(python_list + python_list) # concatenation in lists


print(numpy_array + numpy_array) # element-wise sum in numpy arrays

[1, 2, 3, 1, 2, 3]
[2 4 6]

keyboard_arrow_down Numpy Subsetting

By square brackets

bmi

array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

bmi[1] # bmi for the second person using square brackets

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1152952113.py in <cell line: 0>()
----> 1 bmi[1] # bmi for the second person using square brackets

NameError: name 'bmi' is not defined

Next steps: Explain error

By Array of booleans: If we want to get all BMI values in the bmi array that are over 23

The first step is to use the greater than sign. The result is Numpy array containing booleans: True if corresponding bmi is above 23,
False if it's below.
Next you can use this boolean array inside square brackets to do subsetting.
Result: Only the elements in bmi that are above 23, so for which the corresponding boolean value is True, is selected.

bmi > 23 # The First Step

array([False, False, False, True, False])

bmi[bmi > 23] # Boolean array created inside square brackets

array([24.7473475])

Using the result of a comparison to make a selection of your data is a very common way to get surprising insights

keyboard_arrow_down Task 2:

A list baseball has already been defined in the following Python script, representing the height of some baseball players in cent

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 4/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Import the numpy package as np, so that you can refer to numpy with np.
Use np.array() to create a numpy array from baseball. Name this array np_baseball.
Print out the type of np_baseball to check that you got it right.

# Create
Show list
command baseball
palette (Ctrl+Shift+P)
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np


import numpy as np

# Create a numpy array from baseball: np_baseball


np_baseball = np.array(baseball)

# Print out type of np_baseball


print(type(np_baseball))

<class 'numpy.ndarray'>

keyboard_arrow_down Open the file in read mode


Read height into a list by using .readlines() method from file numpy_baseball_height_only.txt .

with open("numpy_baseball_height_only.txt", "r") as file:

# Read all lines from the file and convert to integers


height_in = [int(line.strip()) for line in file.readlines()]

Print the list of integers


print(height_in)

import numpy as np
Create a numpy array from height_in. Name this new array np_height_in.
Print np_height_in.
Multiply np_height_in with 0.0254 to convert all height measurements from inches to meters. Store the new values in a new array,
np_height_m.
Print out np_height_m and check if the output makes sense.

import numpy as np

# Step 1: Open the file in read mode


with open("numpy_baseball_height_only.txt", "r") as file:
height_in = [int(line.strip()) for line in file.readlines()] # Step 2: Read all lines and convert to integers

# Step 2: Print the list of integers


print("Heights in inches (list):")
print(height_in)

# Step 3: Convert list to numpy array


np_height_in = np.array(height_in)
print("\nHeights in inches (NumPy array):")
print(np_height_in)

# Step 4: Convert to meters (1 inch = 0.0254 m)


np_height_m = np_height_in * 0.0254
print("\nHeights in meters (NumPy array):")
print(np_height_m)

Heights in inches (list):


[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 73, 75, 78, 79, 76, 74, 76, 72, 71, 75, 77, 74, 73, 74, 78, 73, 75,

Heights in inches (NumPy array):


[74 74 72 ... 75 75 73]

Heights in meters (NumPy array):


[1.8796 1.8796 1.8288 ... 1.905 1.905 1.8542]

keyboard_arrow_down Task 3:

Read weight into the list weight_lb from the file numpy_baseball_weight_only.txt

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 5/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Create a numpy array from the weight_lb list with the correct units. Multiply by 0.453592 to go from pounds to kilograms. Store the
resulting numpy array as np_weight_kg.

Use np_height_m and np_weight_kg to calculate the BMI of each player. Use the following equation:

Show command palette (Ctrl+Shift+P)


BMI= weight(kg) / height(m)^2

Save the resulting numpy array as bmi.

Print out bmi.

# Import numpy
import numpy as np

# Read weight from the file into list: weight_lb


with open("numpy_baseball_weight_only.txt", "r") as file:
weight_lb = [int(line.strip()) for line in file.readlines()]

# Create array from weight_lb with metric units: np_weight_kg


np_weight_kg = np.array(weight_lb) * 0.453592

# Calculate the BMI: bmi


BMI= np_weight_kg / np_height_m**2
# Print out bmi
print(BMI)

[23.11037639 27.60406069 28.48080465 ... 25.62295933 23.74810865


25.72686361]

Time to step up your game!!

keyboard_arrow_down Task 4:

Create a boolean numpy array: the element of the array should be True if the corresponding baseball player's BMI is below 21. You can
use the < operator for this. Name the array light.
Print the array light.
Print out a numpy array with the BMIs of all baseball players whose BMI is below 21. Use light inside square brackets to do a selection
on the bmi array.

# height and weight are available as np_height_m and np_weight_kg (if not available, do create)
# Numpy is also imported as np

# Create the light array


np_light=np.array(BMI<21)
# Print out light
print(np_light)

# Print out BMIs of all baseball players whose BMI is below 21


print(BMI[BMI<21])

[False False False ... False False False]


[20.54255679 20.54255679 20.69282047 20.69282047 20.34343189 20.34343189
20.69282047 20.15883472 19.4984471 20.69282047 20.9205219 ]

Wow! It appears that only 11 of the more than 1000 baseball players have a BMI under 21!

keyboard_arrow_down Task 5:

certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same.

Subset np_weight_lb by printing out the element at index 50.


Print out a sub-array of np_height_in that contains the elements at index 100 up to and including index 110.

# height and weight are available as np_height_m and np_weight_kg


# Numpy is also imported as np

# Print out the weight at index 50


print(weight_lb[50])

# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 6/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab

200
[73 74 72 73 69 72 73 75 75 73 72]

Well done you LEGEND!!

keyboard_arrow_down
Show command palette (Ctrl+Shift+P)

2D Numpy Arrays

It's perfectly possible to create 2 dimensional and even three dimensional arrays!
We can create a 2D numpy array from a regular Python list of lists
Also for 2D Array same rule applies, an array can only contain a single type
2D numpy array is an improved list of lists

We can do calculations on the arrays i.e : enables us to do element-wise calculations, the same way we did with 1D numpy arrays
Also, more advanced ways of subsetting i.e Subsetting using Boolean arrays

print(type(np_height))

print(type(np_weight))

Here numpy. tells us that it's a type that was defined in the numpy package. ndarray stands for n-dimensional aray.

# first row is heights in 2D List


# second row is weights in 2d List

# 0 1 2 3 4
np_2d = np.array ([[1.73, 1.68, 1.71, 1.89, 1.79], # 0
[65.4, 59.2, 63.6, 88.4, 68.7]]) # 1
print(np_2d) # A rectangular data structure
print(type(np_2d))

[[ 1.73 1.68 1.71 1.89 1.79]


[65.4 59.2 63.6 88.4 68.7 ]]
<class 'numpy.ndarray'>

.shape attribute of Numpy Array class gives the dimension of the array: giving more information on how our data structure looks like. Note
that the syntax for accessing an attribute is a bit like calling a method but they are not the same. We put round brackets after methods when
calling them but not when calling attributes

np_2d.shape # (2, 5) means 2 rows and 5 columns

(2, 5)

# Homogeneous 2D Arrays
np.array ([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, '68.7']])

array([['1.73', '1.68', '1.71', '1.89', '1.79'],


['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='<U32')

If one float is changed to a string, all the array elements will be coerced to strings, to end up with a homogeneous array

np_2d[0]
# Don't forget about the zero indexing

array([1.73, 1.68, 1.71, 1.89, 1.79])

np_2d[0][2]
# Selecting row 0 and then selecting column 2 for 3rd element

np.float64(1.71)

Alternative Way of Subsetting: The comma method

Using Single Square bracket and a comma.


This syntax is more intuitive and opens up more possibilities.
Intersection Subsetting is possible only by this method.
The old method that we used above of multiple square brackets, and with lists and strings does not allow us intersection subsetting.

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 7/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
np_2d[0, 2] #[row, column]
# The intersection of the rows and columns you specified are returned

np.float64(1.71)

# Selecting
Show height
command palette and weight both of the second and third family member
(Ctrl+Shift+P)
np_2d[:,1:3] # 3 is exclusive here
# The intersection gives us 2D array with 2 rows and 2 columns

array([[ 1.68, 1.71],


[59.2 , 63.6 ]])

# Selecting full weights row only


np_2d[1, :]
# The intersection gives us the entire second row

array([65.4, 59.2, 63.6, 88.4, 68.7])

# This method does not work on lists


d2_list = [[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, 68.7]]

# Non-Intuitive method of multiple square brackets which does not allow intersection subsetting
d2_list[:][1:3]

[[65.4, 59.2, 63.6, 88.4, 68.7]]

keyboard_arrow_down Task 6:

Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.


Print out the type of np_baseball.
Print out the shape attribute of np_baseball. Use np_baseball.shape.

# Create baseball, a list of lists


baseball = [[180, 78.4],
[215, 102.7],
[210, 98.5],
[188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D numpy array from baseball: np_baseball


np_baseball = np.array([[180,215,210,188],
[78.4,102.7,98.5,75.2]])
# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball


print(np_baseball.shape)

<class 'numpy.ndarray'>
(2, 4)

keyboard_arrow_down Task 7:

Create baseball 2D list from height_in and weight_lb


Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.
Print out the shape attribute of np_baseball.

# create baseball a 2D list from height_in and weight_lb or read from the given file
# Example
h = [180, 215, 210, 188]
w = [78.4, 102.7, 98.5, 75.2]

#use zip to create a generator function of tuples from h and w. We unpack the tuple by using two variables x and uy
baseball = [[x, y] for x, y in zip(h, w)]
print(baseball)

# Write your code for to get baseball from height_in and weight_lb

# OR Reading files from csv in a list using csv package


import csv
with open('numpy_baseball_weight_height.csv') as f:
gen = csv.reader(f)
b b ll [[i t( l t) f l t i li ] f li i ] # N t d li t h i f li t f li t/2D li t
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 8/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
baseball = [[int(element) for element in line] for line in gen ] # Nested list comprehensions for list of list/2D list

#print(baseball)
print(baseball)

# Create a 2D numpy array from baseball: np_baseball


np_baseball=
Show np.array(baseball)
command palette (Ctrl+Shift+P)
print(np_baseball)
# Print out the shape of np_baseball
print(np_baseball.shape)

[[180, 78.4], [215, 102.7], [210, 98.5], [188, 75.2]]


[[74, 180], [74, 215], [72, 210], [72, 210], [73, 188], [69, 176], [69, 209], [71, 200], [76, 231], [71, 180], [73, 188], [73, 180],
[[ 74 180]
[ 74 215]
[ 72 210]
...
[ 75 205]
[ 75 190]
[ 73 195]]
(1015, 2)

keyboard_arrow_down Numpy: Basic Statistics

For Data Analysis:

First Thing: Get to know your data


Little data -> simply look at it
Big data(Millions of Billions of Numbers) E.g: City-wide survey of 5000 adults about their height and weight

Simply, staring at these numbers won't give you any insights


But, summary statistics of our data can give us insights
Numpy good at it

Generating Random Data

By using np.random.normal , arguments are

distribution mean
distribution standard deviation
number of samples

height = np.round(np.random.normal(1.75, 0.20, 5000), 2)


weight = np.round(np.random.normal(60.32, 15, 5000), 2)

# Using column_stack to paste height,weight together as two columns, argument tuple of numpy arrays
np_city = np.column_stack((height, weight))

Another awsome thing Numpy can do!

print(height.shape)
print(np_city.shape)

(5000,)
(5000, 2)

np.mean(np_city[:, 0]) # Average height of the adults

np.float64(1.7519799999999999)

# Median Height of Players --> Height of the middle person after sorting adults from small to tall
np.median(np_city[:, 0])

np.float64(1.75)

Often, these summary statistics will provide you with a "Sanity Check" of your data. If we end up with an average weight of 20Kgs in this
case, our measurements are most likely incorrect.

# To check if height and weight are corelated


np.corrcoef(np_city[:,0],np_city[:,1])

array([[ 1. , -0.0143666],
[-0.0143666, 1. ]])

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 9/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
np.std(np_city[:,0])

np.float64(0.19991978291304738)

Numpy also features more basic functions such as sum and sort, which also exist in basic Python Distribution. However, the big difference
Show command palette (Ctrl+Shift+P)
here is speed. Because numpy enforces a single data type in an array, it can drastically speed up the calculations

# Numpy statistic functions that work with list having same datatype elements
x = [1, 4, 8, 10, 12]
print(np.mean(x))
print(np.median(x))

7.0
8.0

Slice in Python

The slice built-in method is used to slice a given sequence (string, bytes, tuple, list or range) or any object which supports sequence
protocol (implements getitem() and len() method).

The syntax of slice() is:

slice(start, stop, step)

https://www.programiz.com/python-programming/methods/built-in/slice

np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]*1000

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1504279680.py in <cell line: 0>()
----> 1 np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]*1000

NameError: name 'np_baseball' is not defined

keyboard_arrow_down Task 8:

Create numpy array np_height_in that is equal to first column of np_baseball(3 cols).
Print out the mean of np_height_in.
Print out the median of np_height_in.

# np_baseball is available
# Create np_height_in from np_baseball
np_height_in = np.array(np_baseball[:,0])
print(np_height_in)
# Print out the mean of np_height_in
np_mean=np.mean(np_height_in)
print(np_mean)
# Print out the median of np_height_in
np_median=np.median(np_height_in)
print(np_median)

[74 74 72 ... 75 75 73]


73.6896551724138
74.0

An average height of 1586 inches, that doesn't sound right, does it? However, the median does not seem affected by the outliers: 74 inches
makes perfect sense. It's always a good idea to check both the median and the mean, to get an idea about the overall distribution of the
entire dataset.

np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]/1000

keyboard_arrow_down Task 9:

The code to print out the mean height is already included. Complete the code for the median height. Replace None with the correct
code.
Use np.std() on the first column of np_baseball to calculate stddev. Replace None with the correct code.
Do big players tend to be heavier? Use np.corrcoef() to store the correlation between the first and second column of np_baseball in
corr. Replace None with the correct code

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 10/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
# Print mean height (first column)
avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))

# Print median height. Replace 'None'


med = np.median(np_baseball[:,0])
Show command palette (Ctrl+Shift+P)
print("Median: " + str(med))

# Print out the standard deviation on height. Replace 'None'


stddev = np.std(np_baseball[:,0])
print("Standard Deviation: " + str(stddev))

# Print out correlation between first and second column. Replace 'None'
corr = np.corrcoef(np_baseball[:,0], np_baseball[:,1])
print("Correlation: " + str(corr))

Average: 73.6896551724138
Median: 74.0
Standard Deviation: 2.312791881046546
Correlation: [[1. 0.53153932]
[0.53153932 1. ]]

keyboard_arrow_down Task 10: Blend it all together

You've contacted FIFA for some data and they handed you two lists. The lists are the following:

positions = ['GK', 'M', 'A', 'D', ...] heights = [191, 184, 185, 180, ...]

Each element in the lists corresponds to a player. The first list, positions, contains strings representing each player's positio
You're fairly confident that the median height of goalkeepers is higher than that of other players on the soccer field. Some of y

Convert heights and positions, which are regular lists, to numpy arrays. Call them np_heights and np_positions.
Extract all the heights of the goalkeepers. You can use a little trick here: use np_positions == 'GK' as an index for np_heights. Assign the
result to gk_heights. [If you encounter an error, use np.where() to find the indices where the value is true. indicies=
np.where(np.height>13)]
Extract all the heights of all the other players. This time use np_positions != 'GK' as an index for np_heights. Assign the result to
other_heights.
Print out the median height of the goalkeepers using np.median(). Replace None with the correct code.
Do the same for the other players. Print out their median height. Replace None with the correct code.

# heights and positions are available as lists


with open('fifa_position.txt') as f:
positions = f.readlines()
positions = list(map(lambda x: x.strip(),positions)) #.strip() Returns a copy of the string with both
#print(positions) #leading and trailing characters removed--> Used to remove '\n' in
with open('fifa_height.txt') as f: #each element
height = f.readlines()
height = list(map(lambda x: int(x),height))
#print(height)
# Import numpy
import numpy as np

# Convert positions and heights to numpy arrays: np_positions, np_heights


np_positions = np.array(positions)
np_heights = np.array(height)

# Heights of the goalkeepers: gk_heights


gk_heights=np_heights[np_positions == 'GK']

# Heights of the other players: other_heights


other_heights=np_heights[np_positions != 'GK']

# Print out the median height of goalkeepers. Replace 'None'


print("Median height of goalkeepers: " + str(np.median(gk_heights)))

# Print out the median height of other players. Replace 'None'


print("Median height of other players: " + str(np.median(other_heights)))

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 11/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipython-input-2704805964.py in <cell line: 0>()
16
17 # Heights of the goalkeepers: gk_heights
---> 18 gk_heights=np_heights[np_positions == 'GK']
Show command19
palette (Ctrl+Shift+P)
20 # Heights of the other players: other_heights

IndexError: boolean index did not match indexed array along axis 0; size of axis is 12 but size of corresponding boolean axis is
8847

Next steps: Explain error

keyboard_arrow_down Task 11: Compare arrays

Using comparison operators, generate boolean arrays that answer the following questions:

Which areas in my_house are greater than or equal to 18?


You can also compare two Numpy arrays element-wise. Which areas in my_house are smaller than the ones in your_house?
Make sure to wrap both commands in a print() statement so that you can inspect the output!

# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than or equal to 18


print(my_house[my_house>=18])

# my_house less than your_house


print(my_house[my_house< your_house])

[18. 20.]
[20. 10.75]

keyboard_arrow_down Task 12:

Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the

Generate boolean arrays that answer the following questions:


Which areas in my_house are greater than 18.5 or smaller than 10?
Which areas are smaller than 11 in both my_house and your_house? Make sure to wrap both commands in print() statement, so that
you can inspect the output.

# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10


print(np.logical_or(my_house > 18.5, my_house < 10))

# my_house less than your_house


print(my_house < your_house)

# Both my_house and your_house smaller than 11


print(np.logical_and(my_house < 11, your_house < 11))

[False True False True]


[False True True False]
[False False False True]

keyboard_arrow_down For loop with Numpy Array

# The most basic for loop does the trick


for value in bmi:
print(value)

keyboard_arrow_down For loop with 2D Numpy Array

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 12/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
import numpy as np
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])
meas = np.array([np_height, np_weight])

Show
for command
val in palette
meas (Ctrl+Shift+P)
:
print(val)

If we want to print out each element in this 2D array separately, the same basic for loop won't do the trick. The 2D array is actually
built up from an array of 1D arrays.The for loop simply prints out an entire array on each iteration.
To visit every element of an array, we can use a Numpy function called nditer() . Input is the array you want to iterate over.

for val in np.nditer(meas):


print(val)

keyboard_arrow_down Task 13:

Import the numpy package under the local alias np.


Write a for loop that iterates over all elements in np_height and prints out "x inches" for each element, where x is the value in the array.
Write a for loop that visits every element of the np_baseball array and prints it out.
Try to add an additional argument end = to the print() call - the output will be mesmerizing!

# Import numpy as np
import numpy as np
# For loop over np_height
for x in np_height:
print(str(x) + " inches")
# For loop over np_baseball
for x in np.nditer(np_baseball):
print(x)

Show hidden output

keyboard_arrow_down Importing Flat Files using Numpy

If all the data is numerical, we can use the package numpy to import the data as a numpy array

Numpy arrays are python standard for storing numerical data. They are efficient, fast and clean
Numpy arrays are often essential for other packages e.g scikit-learn

Numpy itself has a number of built-in functions that make it far easier and more effecient for us to import data as arrays.

import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',') #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.

The default delimeter is white space, so we will usually need to specify it explicitly

# If your data is numeric and header has strings, we use skiprows = 1 to skip header row
import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',', skiprows=1) #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.

# For using only columns 0 and 2, we write usecols = [0,2] as 4th loadtxt argument
import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols = [0,2]) #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.

# We can also import different data type into Numpy arrays. We set dtype = str to import all members as string type
data = np.loadtxt(filename, delimiter=',', dtype=str)
data

loadtxt breaks down when we have mixed datatypes: when one column of float and other of string datatype
Natural place for mixed datatypes is pandas dataframes, not Numpy. Although, Numpy can handle mixed datatypes

keyboard_arrow_down Task 14:

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 13/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Use np.loadtxt() by passing file and a comma ',' for the delimiter.
Print the type of object digits
Execute the rest of the code to visualize one of the rows of the data.

Show command palette (Ctrl+Shift+P)


# Import package
import numpy as np
import matplotlib.pyplot as plt
# Assign filename to variable: file
file = 'digits.csv'

# Load file as array: digits


data=np.loadtxt(file,delimiter=',')
digits = np.array(data)
# Print datatype of digits
print(type(digits))
print(digits.shape)

# Select and reshape a row


im = digits[21, 1:] #Selecting image No. 22 and then selecting columns 1-end
print(im.shape)
im_sq = np.reshape(im, (28, 28)) # Reshaping image to shape 28*28 = 784
print(im_sq.shape)

# Plot reshaped data (matplotlib.pyplot already loaded as plt)


plt.imshow(im_sq, cmap='Greys', interpolation='nearest')
plt.show()

<class 'numpy.ndarray'>
(100, 785)
(784,)
(28, 28)

You can use '\t' for tab-delimited. skiprows allows you to specify how many rows (not indices) you wish to skip. usecols takes a list of the
indices of the columns you wish to keep.

data = pd.read_csv(
"file_name.txt",
delimiter="\t",
skiprows=2, # skip the first 2 metadata rows
usecols=[0, 2] # only load Name and City
)
print(data)

keyboard_arrow_down Task 15:

The file seaslug.txt has a text header, consisting of strings and is tab-delimited. These data consists of percentage of sea slug

Use np.loadtxt() by passing file as the first argument, file is tab delimeted and import the elements as string datatype.
print the first element of data.
Use np.loadtxt() again. This time The file you're importing is tab-delimited, the datatype is float, and you want to skip the first row.

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 14/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Print the 10th element of data_float. Be guided by the previous print() call.
Execute the rest of the code to visualize the data.

# Assign filename: file


file = 'seaslug.txt'
Show command palette (Ctrl+Shift+P)

# Import file: data


data = np.loadtxt(file, delimiter='\t', dtype=str)
# Print the first element of data
print(data[0])
# Import data as floats and skip the first row: data_float
data_float=np.loadtxt(file, delimiter='\t', dtype=float, skiprows=1)

# Print the 10th element of data_float


print(data_float[9])

# Plot a scatterplot of the data


plt.scatter(data_float[:, 0], data_float[:, 1])
plt.xlabel('time (min.)')
plt.ylabel('percentage of larvae')
plt.show()

['Time' 'Percent']
[0. 0.357]

Due to the header, if you tried to import it as-is using np.loadtxt(), Python would throw you a ValueError and tell you that it could not
convert string to float. There are two ways to deal with this: firstly, you can set the data type argument dtype equal to str (for string).

data = np.loadtxt("file_name.txt", dtype=str)

Alternatively, you can skip the first row as we have seen before, using the skiprows argument.

data = np.loadtxt("file_name.txt", skiprows=1)

More Numpy
keyboard_arrow_down

[ ] subdirectory_arrow_right 30 cells hidden

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 15/15

You might also like