0% found this document useful (0 votes)

118 views15 pages

Lab 03 Numpy - Ipynb - Colab

Uploaded by

i222162

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views15 pages

Lab 03 Numpy - Ipynb - Colab

Uploaded by

i222162

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

9/1/25, 11:25 AM Lab_03_Numpy.

ipynb - Colab

Name# Asma khan

Roll No# 22i-2162

Show command palette (Ctrl+Shift+P)

Lab 3: Numpy and Vectorized Computation

Learning Outcomes:

Import files:

How to import csv files

Numpy Basics:

Importing Files Using Numpy

More Numpy

keyboard_arrow_down Import Files

A CSV file is a plain text file that stores data in tabular format, where each row corresponds to a record. Each column is separated by a
comma , (or sometimes other delimiters like ; or \t).

Example of a CSV file:

Name, Age, City

Ali, 21, Lahore

Sara, 23, Karachi

Ahmed, 22, Islamabad

To analyze this data, we must first import it into our program. In Python, the most common way is by using the pandas library.

keyboard_arrow_down 1. Import pandas library in Python

import pandas as pd

keyboard_arrow_down 2. Load CSV file:

data = pd.read_csv("filename.csv")

# If you want to load the excel sheet

# data = pd.read_excel("StudentsData.xlsx")

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1993608275.py in <cell line: 0>()
----> 1 data = pd.read_csv("filename.csv")
2
3 # If you want to load the excel sheet
4 # data = pd.read_excel("StudentsData.xlsx")

NameError: name 'pd' is not defined

Next steps: Explain error

keyboard_arrow_down 3. View the first few rows of data:

print(data.head())

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-2510413559.py in <cell line: 0>()
----> 1 print(data.head())

NameError: name 'data' is not defined

Show command palette (Ctrl+Shift+P)

Next steps: Explain error

keyboard_arrow_down 4. View the last few rows of data:

print(data.tail())

keyboard_arrow_down 5. Access specific column:

print(data["Name"]) # Display only the 'Name' column

keyboard_arrow_down Task# 1

You are provided with a file named StudentsPerformance.csv, which contains information about student's performance. Your task is to load
the file into Python using pandas and perform the above five steps.

import pandas as pd

# Step 1: Load dataset

data = pd.read_csv("StudentsPerformance.csv")

# Step 2: View the first few rows of data

print(data.head())

# Step 2: Display first 10 rows

print(data.head(10))

# Step 3: Display last 10 rows

print(data.tail(10))

# Step 4: Access the test preparation column from the file

print(data["test preparation course"])

gender race/ethnicity parental level of education lunch \

0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard
5 female group B associate's degree standard
6 female group B some college standard
7 male group B some college free/reduced
8 male group D high school free/reduced
9 female group B high school free/reduced

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75
5 none 71 83 78
6 completed 88 95 92
7 none 40 43 39
8 completed 64 64 67

test preparation course math score reading score writing score

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 2/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
test preparation course math score reading score writing score
990 completed 86 81 75
991 completed 65 82 78
992 none 55 76 76
993 none 62 72 74
994 none 63 63 62
995 completed 88 99 95
Show command palette (Ctrl+Shift+P)
996 none 62 55 55
997 completed 59 71 65
998 completed 68 78 77
999 none 77 86 86
0 none
1 completed
2 none
3 none
4 none
...
995 completed
996 none
997 completed
998 completed
999 none
Name: test preparation course, Length: 1000, dtype: object

keyboard_arrow_down Numpy Basics

NumPy is a fundamental and a powerful Python package to efficiently practice machine learning.

Numeric Python
Alternative to Python List
A new kind of Python data type , like a float, string or list etc.
comes with own methods
Calculations over entire Sequence(array)
Easy and Super-Fast
Contain only one type i.e either an array of booleans or either an array of floats, and so on.
Installation: Execute pip3 install numpy in anaconda shell

Recap of Lists

Can Hold Different types(Any)

Change, add, remove elements

height = [1.73, 1.68, 1.71, 1.89, 1.79]

print(height)

[1.73, 1.68, 1.71, 1.89, 1.79]

weight = [65.4, 59.2, 63.6, 88.4, 68.7]

print(weight)
weight1=[0,0,0,0,0]
for x in range(0,len(weight)):
weight1[x]=weight[x]/10
print(weight1)

[65.4, 59.2, 63.6, 88.4, 68.7]

[6.540000000000001, 5.92, 6.36, 8.84, 6.87]

keyboard_arrow_down Creating Numpy Array

import numpy as np

np_height = np.array(height) # Input = list Output = Numpy Array

print(np_height)
print(height)

print(type(np_height),type(height))

[1.73 1.68 1.71 1.89 1.79]

[1.73, 1.68, 1.71, 1.89, 1.79]
<class 'numpy.ndarray'> <class 'list'>

np_weight = np.array(weight)

print(np_weight)

[65.4 59.2 63.6 88.4 68.7]

This time worked fine: the calculations were performed element-wise. The first person's BMI was calculated by dividing the first element in
np_weight by the square of the first element in np_height and so on
Show command palette (Ctrl+Shift+P)

bmi = np_weight / np_height**2

print(bmi)

[21.85171573 20.97505669 21.75028214 24.7473475 21.44127836]

When numpy array is created with different types, the resulting Numpy array will contain a single type i.e string in this case. The boolean
and float were both converted to strings

np.array([1.0,True,'is'])

array(['1.0', 'True', 'is'], dtype='<U32')

python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

print(python_list + python_list) # concatenation in lists

print(numpy_array + numpy_array) # element-wise sum in numpy arrays

[1, 2, 3, 1, 2, 3]
[2 4 6]

keyboard_arrow_down Numpy Subsetting

By square brackets

bmi

array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

bmi[1] # bmi for the second person using square brackets

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1152952113.py in <cell line: 0>()
----> 1 bmi[1] # bmi for the second person using square brackets

NameError: name 'bmi' is not defined

Next steps: Explain error

By Array of booleans: If we want to get all BMI values in the bmi array that are over 23

The first step is to use the greater than sign. The result is Numpy array containing booleans: True if corresponding bmi is above 23,
False if it's below.
Next you can use this boolean array inside square brackets to do subsetting.
Result: Only the elements in bmi that are above 23, so for which the corresponding boolean value is True, is selected.

bmi > 23 # The First Step

array([False, False, False, True, False])

bmi[bmi > 23] # Boolean array created inside square brackets

array([24.7473475])

Using the result of a comparison to make a selection of your data is a very common way to get surprising insights

keyboard_arrow_down Task 2:

A list baseball has already been defined in the following Python script, representing the height of some baseball players in cent

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 4/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Import the numpy package as np, so that you can refer to numpy with np.
Use np.array() to create a numpy array from baseball. Name this array np_baseball.
Print out the type of np_baseball to check that you got it right.

# Create
Show list
command baseball
palette (Ctrl+Shift+P)
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np

import numpy as np

# Create a numpy array from baseball: np_baseball

np_baseball = np.array(baseball)

# Print out type of np_baseball

print(type(np_baseball))

keyboard_arrow_down Open the file in read mode

Read height into a list by using .readlines() method from file numpy_baseball_height_only.txt .

with open("numpy_baseball_height_only.txt", "r") as file:

# Read all lines from the file and convert to integers

height_in = [int(line.strip()) for line in file.readlines()]

Print the list of integers

print(height_in)

import numpy as np
Create a numpy array from height_in. Name this new array np_height_in.
Print np_height_in.
Multiply np_height_in with 0.0254 to convert all height measurements from inches to meters. Store the new values in a new array,
np_height_m.
Print out np_height_m and check if the output makes sense.

import numpy as np

# Step 1: Open the file in read mode

with open("numpy_baseball_height_only.txt", "r") as file:
height_in = [int(line.strip()) for line in file.readlines()] # Step 2: Read all lines and convert to integers

# Step 2: Print the list of integers

print("Heights in inches (list):")
print(height_in)

# Step 3: Convert list to numpy array

np_height_in = np.array(height_in)
print("\nHeights in inches (NumPy array):")
print(np_height_in)

# Step 4: Convert to meters (1 inch = 0.0254 m)

np_height_m = np_height_in * 0.0254
print("\nHeights in meters (NumPy array):")
print(np_height_m)

Heights in inches (list):

[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 73, 75, 78, 79, 76, 74, 76, 72, 71, 75, 77, 74, 73, 74, 78, 73, 75,

Heights in inches (NumPy array):

[74 74 72 ... 75 75 73]

Heights in meters (NumPy array):

[1.8796 1.8796 1.8288 ... 1.905 1.905 1.8542]

keyboard_arrow_down Task 3:

Read weight into the list weight_lb from the file numpy_baseball_weight_only.txt

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 5/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Create a numpy array from the weight_lb list with the correct units. Multiply by 0.453592 to go from pounds to kilograms. Store the
resulting numpy array as np_weight_kg.

Use np_height_m and np_weight_kg to calculate the BMI of each player. Use the following equation:

Show command palette (Ctrl+Shift+P)

BMI= weight(kg) / height(m)^2

Save the resulting numpy array as bmi.

Print out bmi.

# Import numpy
import numpy as np

# Read weight from the file into list: weight_lb

with open("numpy_baseball_weight_only.txt", "r") as file:
weight_lb = [int(line.strip()) for line in file.readlines()]

# Create array from weight_lb with metric units: np_weight_kg

np_weight_kg = np.array(weight_lb) * 0.453592

# Calculate the BMI: bmi

BMI= np_weight_kg / np_height_m**2
# Print out bmi
print(BMI)

[23.11037639 27.60406069 28.48080465 ... 25.62295933 23.74810865

25.72686361]

Time to step up your game!!

keyboard_arrow_down Task 4:

Create a boolean numpy array: the element of the array should be True if the corresponding baseball player's BMI is below 21. You can
use the < operator for this. Name the array light.
Print the array light.
Print out a numpy array with the BMIs of all baseball players whose BMI is below 21. Use light inside square brackets to do a selection
on the bmi array.

# height and weight are available as np_height_m and np_weight_kg (if not available, do create)
# Numpy is also imported as np

# Create the light array

np_light=np.array(BMI<21)
# Print out light
print(np_light)

# Print out BMIs of all baseball players whose BMI is below 21

print(BMI[BMI<21])

[False False False ... False False False]

[20.54255679 20.54255679 20.69282047 20.69282047 20.34343189 20.34343189
20.69282047 20.15883472 19.4984471 20.69282047 20.9205219 ]

Wow! It appears that only 11 of the more than 1000 baseball players have a BMI under 21!

keyboard_arrow_down Task 5:

certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same.

Subset np_weight_lb by printing out the element at index 50.

Print out a sub-array of np_height_in that contains the elements at index 100 up to and including index 110.

# height and weight are available as np_height_m and np_weight_kg

# Numpy is also imported as np

# Print out the weight at index 50

print(weight_lb[50])

# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])

200
[73 74 72 73 69 72 73 75 75 73 72]

Well done you LEGEND!!

keyboard_arrow_down
Show command palette (Ctrl+Shift+P)

2D Numpy Arrays

It's perfectly possible to create 2 dimensional and even three dimensional arrays!
We can create a 2D numpy array from a regular Python list of lists
Also for 2D Array same rule applies, an array can only contain a single type
2D numpy array is an improved list of lists

We can do calculations on the arrays i.e : enables us to do element-wise calculations, the same way we did with 1D numpy arrays
Also, more advanced ways of subsetting i.e Subsetting using Boolean arrays

print(type(np_height))

print(type(np_weight))

Here numpy. tells us that it's a type that was defined in the numpy package. ndarray stands for n-dimensional aray.

# first row is heights in 2D List

# second row is weights in 2d List

# 0 1 2 3 4
np_2d = np.array ([[1.73, 1.68, 1.71, 1.89, 1.79], # 0
[65.4, 59.2, 63.6, 88.4, 68.7]]) # 1
print(np_2d) # A rectangular data structure
print(type(np_2d))

[[ 1.73 1.68 1.71 1.89 1.79]

[65.4 59.2 63.6 88.4 68.7 ]]
<class 'numpy.ndarray'>

.shape attribute of Numpy Array class gives the dimension of the array: giving more information on how our data structure looks like. Note
that the syntax for accessing an attribute is a bit like calling a method but they are not the same. We put round brackets after methods when
calling them but not when calling attributes

np_2d.shape # (2, 5) means 2 rows and 5 columns

(2, 5)

# Homogeneous 2D Arrays
np.array ([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, '68.7']])

array([['1.73', '1.68', '1.71', '1.89', '1.79'],

['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='<U32')

If one float is changed to a string, all the array elements will be coerced to strings, to end up with a homogeneous array

np_2d[0]
# Don't forget about the zero indexing

array([1.73, 1.68, 1.71, 1.89, 1.79])

np_2d[0][2]
# Selecting row 0 and then selecting column 2 for 3rd element

np.float64(1.71)

Alternative Way of Subsetting: The comma method

Using Single Square bracket and a comma.

This syntax is more intuitive and opens up more possibilities.
Intersection Subsetting is possible only by this method.
The old method that we used above of multiple square brackets, and with lists and strings does not allow us intersection subsetting.

np.float64(1.71)

# Selecting
Show height
command palette and weight both of the second and third family member
(Ctrl+Shift+P)
np_2d[:,1:3] # 3 is exclusive here
# The intersection gives us 2D array with 2 rows and 2 columns

array([[ 1.68, 1.71],

[59.2 , 63.6 ]])

# Selecting full weights row only

np_2d[1, :]
# The intersection gives us the entire second row

array([65.4, 59.2, 63.6, 88.4, 68.7])

# This method does not work on lists

d2_list = [[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, 68.7]]

# Non-Intuitive method of multiple square brackets which does not allow intersection subsetting
d2_list[:][1:3]

[[65.4, 59.2, 63.6, 88.4, 68.7]]

keyboard_arrow_down Task 6:

Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.

Print out the type of np_baseball.
Print out the shape attribute of np_baseball. Use np_baseball.shape.

# Create baseball, a list of lists

baseball = [[180, 78.4],
[215, 102.7],
[210, 98.5],
[188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D numpy array from baseball: np_baseball

np_baseball = np.array([[180,215,210,188],
[78.4,102.7,98.5,75.2]])
# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball

print(np_baseball.shape)

<class 'numpy.ndarray'>
(2, 4)

keyboard_arrow_down Task 7:

Create baseball 2D list from height_in and weight_lb

Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.
Print out the shape attribute of np_baseball.

# create baseball a 2D list from height_in and weight_lb or read from the given file
# Example
h = [180, 215, 210, 188]
w = [78.4, 102.7, 98.5, 75.2]

#use zip to create a generator function of tuples from h and w. We unpack the tuple by using two variables x and uy
baseball = [[x, y] for x, y in zip(h, w)]
print(baseball)

# Write your code for to get baseball from height_in and weight_lb

# OR Reading files from csv in a list using csv package

import csv
with open('numpy_baseball_weight_height.csv') as f:
gen = csv.reader(f)
b b ll [[i t( l t) f l t i li ] f li i ] # N t d li t h i f li t f li t/2D li t
https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 8/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
baseball = [[int(element) for element in line] for line in gen ] # Nested list comprehensions for list of list/2D list

#print(baseball)
print(baseball)

# Create a 2D numpy array from baseball: np_baseball

np_baseball=
Show np.array(baseball)
command palette (Ctrl+Shift+P)
print(np_baseball)
# Print out the shape of np_baseball
print(np_baseball.shape)

[[180, 78.4], [215, 102.7], [210, 98.5], [188, 75.2]]

[[74, 180], [74, 215], [72, 210], [72, 210], [73, 188], [69, 176], [69, 209], [71, 200], [76, 231], [71, 180], [73, 188], [73, 180],
[[ 74 180]
[ 74 215]
[ 72 210]
...
[ 75 205]
[ 75 190]
[ 73 195]]
(1015, 2)

keyboard_arrow_down Numpy: Basic Statistics

For Data Analysis:

First Thing: Get to know your data

Little data -> simply look at it
Big data(Millions of Billions of Numbers) E.g: City-wide survey of 5000 adults about their height and weight

Simply, staring at these numbers won't give you any insights

But, summary statistics of our data can give us insights
Numpy good at it

Generating Random Data

By using np.random.normal , arguments are

distribution mean
distribution standard deviation
number of samples

height = np.round(np.random.normal(1.75, 0.20, 5000), 2)

weight = np.round(np.random.normal(60.32, 15, 5000), 2)

# Using column_stack to paste height,weight together as two columns, argument tuple of numpy arrays
np_city = np.column_stack((height, weight))

Another awsome thing Numpy can do!

print(height.shape)
print(np_city.shape)

(5000,)
(5000, 2)

np.mean(np_city[:, 0]) # Average height of the adults

np.float64(1.7519799999999999)

# Median Height of Players --> Height of the middle person after sorting adults from small to tall
np.median(np_city[:, 0])

np.float64(1.75)

Often, these summary statistics will provide you with a "Sanity Check" of your data. If we end up with an average weight of 20Kgs in this
case, our measurements are most likely incorrect.

# To check if height and weight are corelated

np.corrcoef(np_city[:,0],np_city[:,1])

array([[ 1. , -0.0143666],
[-0.0143666, 1. ]])

np.float64(0.19991978291304738)

Numpy also features more basic functions such as sum and sort, which also exist in basic Python Distribution. However, the big difference
Show command palette (Ctrl+Shift+P)
here is speed. Because numpy enforces a single data type in an array, it can drastically speed up the calculations

# Numpy statistic functions that work with list having same datatype elements
x = [1, 4, 8, 10, 12]
print(np.mean(x))
print(np.median(x))

7.0
8.0

Slice in Python

The slice built-in method is used to slice a given sequence (string, bytes, tuple, list or range) or any object which supports sequence
protocol (implements getitem() and len() method).

The syntax of slice() is:

slice(start, stop, step)

https://www.programiz.com/python-programming/methods/built-in/slice

np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]*1000

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-1504279680.py in <cell line: 0>()
----> 1 np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]*1000

NameError: name 'np_baseball' is not defined

keyboard_arrow_down Task 8:

Create numpy array np_height_in that is equal to first column of np_baseball(3 cols).
Print out the mean of np_height_in.
Print out the median of np_height_in.

# np_baseball is available
# Create np_height_in from np_baseball
np_height_in = np.array(np_baseball[:,0])
print(np_height_in)
# Print out the mean of np_height_in
np_mean=np.mean(np_height_in)
print(np_mean)
# Print out the median of np_height_in
np_median=np.median(np_height_in)
print(np_median)

[74 74 72 ... 75 75 73]

73.6896551724138
74.0

An average height of 1586 inches, that doesn't sound right, does it? However, the median does not seem affected by the outliers: 74 inches
makes perfect sense. It's always a good idea to check both the median and the mean, to get an idea about the overall distribution of the
entire dataset.

np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]/1000

keyboard_arrow_down Task 9:

The code to print out the mean height is already included. Complete the code for the median height. Replace None with the correct
code.
Use np.std() on the first column of np_baseball to calculate stddev. Replace None with the correct code.
Do big players tend to be heavier? Use np.corrcoef() to store the correlation between the first and second column of np_baseball in
corr. Replace None with the correct code

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 10/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
# Print mean height (first column)
avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))

# Print median height. Replace 'None'

med = np.median(np_baseball[:,0])
Show command palette (Ctrl+Shift+P)
print("Median: " + str(med))

# Print out the standard deviation on height. Replace 'None'

stddev = np.std(np_baseball[:,0])
print("Standard Deviation: " + str(stddev))

# Print out correlation between first and second column. Replace 'None'
corr = np.corrcoef(np_baseball[:,0], np_baseball[:,1])
print("Correlation: " + str(corr))

Average: 73.6896551724138
Median: 74.0
Standard Deviation: 2.312791881046546
Correlation: [[1. 0.53153932]
[0.53153932 1. ]]

keyboard_arrow_down Task 10: Blend it all together

You've contacted FIFA for some data and they handed you two lists. The lists are the following:

positions = ['GK', 'M', 'A', 'D', ...] heights = [191, 184, 185, 180, ...]

Each element in the lists corresponds to a player. The first list, positions, contains strings representing each player's positio
You're fairly confident that the median height of goalkeepers is higher than that of other players on the soccer field. Some of y

Convert heights and positions, which are regular lists, to numpy arrays. Call them np_heights and np_positions.
Extract all the heights of the goalkeepers. You can use a little trick here: use np_positions == 'GK' as an index for np_heights. Assign the
result to gk_heights. [If you encounter an error, use np.where() to find the indices where the value is true. indicies=
np.where(np.height>13)]
Extract all the heights of all the other players. This time use np_positions != 'GK' as an index for np_heights. Assign the result to
other_heights.
Print out the median height of the goalkeepers using np.median(). Replace None with the correct code.
Do the same for the other players. Print out their median height. Replace None with the correct code.

# heights and positions are available as lists

with open('fifa_position.txt') as f:
positions = f.readlines()
positions = list(map(lambda x: x.strip(),positions)) #.strip() Returns a copy of the string with both
#print(positions) #leading and trailing characters removed--> Used to remove '\n' in
with open('fifa_height.txt') as f: #each element
height = f.readlines()
height = list(map(lambda x: int(x),height))
#print(height)
# Import numpy
import numpy as np

# Convert positions and heights to numpy arrays: np_positions, np_heights

np_positions = np.array(positions)
np_heights = np.array(height)

# Heights of the goalkeepers: gk_heights

gk_heights=np_heights[np_positions == 'GK']

# Heights of the other players: other_heights

other_heights=np_heights[np_positions != 'GK']

# Print out the median height of goalkeepers. Replace 'None'

print("Median height of goalkeepers: " + str(np.median(gk_heights)))

# Print out the median height of other players. Replace 'None'

print("Median height of other players: " + str(np.median(other_heights)))

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipython-input-2704805964.py in <cell line: 0>()
16
17 # Heights of the goalkeepers: gk_heights
---> 18 gk_heights=np_heights[np_positions == 'GK']
Show command19
palette (Ctrl+Shift+P)
20 # Heights of the other players: other_heights

IndexError: boolean index did not match indexed array along axis 0; size of axis is 12 but size of corresponding boolean axis is
8847

Next steps: Explain error

keyboard_arrow_down Task 11: Compare arrays

Using comparison operators, generate boolean arrays that answer the following questions:

Which areas in my_house are greater than or equal to 18?

You can also compare two Numpy arrays element-wise. Which areas in my_house are smaller than the ones in your_house?
Make sure to wrap both commands in a print() statement so that you can inspect the output!

# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than or equal to 18

print(my_house[my_house>=18])

# my_house less than your_house

print(my_house[my_house< your_house])

[18. 20.]
[20. 10.75]

keyboard_arrow_down Task 12:

Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the

Generate boolean arrays that answer the following questions:

Which areas in my_house are greater than 18.5 or smaller than 10?
Which areas are smaller than 11 in both my_house and your_house? Make sure to wrap both commands in print() statement, so that
you can inspect the output.

# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10

print(np.logical_or(my_house > 18.5, my_house < 10))

# my_house less than your_house

print(my_house < your_house)

# Both my_house and your_house smaller than 11

print(np.logical_and(my_house < 11, your_house < 11))

[False True False True]

[False True True False]
[False False False True]

keyboard_arrow_down For loop with Numpy Array

# The most basic for loop does the trick

for value in bmi:
print(value)

keyboard_arrow_down For loop with 2D Numpy Array

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 12/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
import numpy as np
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])
meas = np.array([np_height, np_weight])

Show
for command
val in palette
meas (Ctrl+Shift+P)
:
print(val)

If we want to print out each element in this 2D array separately, the same basic for loop won't do the trick. The 2D array is actually
built up from an array of 1D arrays.The for loop simply prints out an entire array on each iteration.
To visit every element of an array, we can use a Numpy function called nditer() . Input is the array you want to iterate over.

for val in np.nditer(meas):

print(val)

keyboard_arrow_down Task 13:

Import the numpy package under the local alias np.

Write a for loop that iterates over all elements in np_height and prints out "x inches" for each element, where x is the value in the array.
Write a for loop that visits every element of the np_baseball array and prints it out.
Try to add an additional argument end = to the print() call - the output will be mesmerizing!

# Import numpy as np
import numpy as np
# For loop over np_height
for x in np_height:
print(str(x) + " inches")
# For loop over np_baseball
for x in np.nditer(np_baseball):
print(x)

Show hidden output

keyboard_arrow_down Importing Flat Files using Numpy

If all the data is numerical, we can use the package numpy to import the data as a numpy array

Numpy arrays are python standard for storing numerical data. They are efficient, fast and clean
Numpy arrays are often essential for other packages e.g scikit-learn

Numpy itself has a number of built-in functions that make it far easier and more effecient for us to import data as arrays.

import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',') #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.

The default delimeter is white space, so we will usually need to specify it explicitly

# If your data is numeric and header has strings, we use skiprows = 1 to skip header row
import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',', skiprows=1) #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.

# For using only columns 0 and 2, we write usecols = [0,2] as 4th loadtxt argument
import numpy as np
filename = 'mnist_kaggle_some_rows.csv'
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols = [0,2]) #The first argument will be the filename.
data #The second will be the delimiter which, in this case, is a comma.

# We can also import different data type into Numpy arrays. We set dtype = str to import all members as string type
data = np.loadtxt(filename, delimiter=',', dtype=str)
data

loadtxt breaks down when we have mixed datatypes: when one column of float and other of string datatype
Natural place for mixed datatypes is pandas dataframes, not Numpy. Although, Numpy can handle mixed datatypes

keyboard_arrow_down Task 14:

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 13/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Use np.loadtxt() by passing file and a comma ',' for the delimiter.
Print the type of object digits
Execute the rest of the code to visualize one of the rows of the data.

Show command palette (Ctrl+Shift+P)

# Import package
import numpy as np
import matplotlib.pyplot as plt
# Assign filename to variable: file
file = 'digits.csv'

# Load file as array: digits

data=np.loadtxt(file,delimiter=',')
digits = np.array(data)
# Print datatype of digits
print(type(digits))
print(digits.shape)

# Select and reshape a row

im = digits[21, 1:] #Selecting image No. 22 and then selecting columns 1-end
print(im.shape)
im_sq = np.reshape(im, (28, 28)) # Reshaping image to shape 28*28 = 784
print(im_sq.shape)

# Plot reshaped data (matplotlib.pyplot already loaded as plt)

plt.imshow(im_sq, cmap='Greys', interpolation='nearest')
plt.show()

<class 'numpy.ndarray'>
(100, 785)
(784,)
(28, 28)

You can use '\t' for tab-delimited. skiprows allows you to specify how many rows (not indices) you wish to skip. usecols takes a list of the
indices of the columns you wish to keep.

data = pd.read_csv(
"file_name.txt",
delimiter="\t",
skiprows=2, # skip the first 2 metadata rows
usecols=[0, 2] # only load Name and City
)
print(data)

keyboard_arrow_down Task 15:

The file seaslug.txt has a text header, consisting of strings and is tab-delimited. These data consists of percentage of sea slug

Use np.loadtxt() by passing file as the first argument, file is tab delimeted and import the elements as string datatype.
print the first element of data.
Use np.loadtxt() again. This time The file you're importing is tab-delimited, the datatype is float, and you want to skip the first row.

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 14/15
9/1/25, 11:25 AM Lab_03_Numpy.ipynb - Colab
Print the 10th element of data_float. Be guided by the previous print() call.
Execute the rest of the code to visualize the data.

# Assign filename: file

file = 'seaslug.txt'
Show command palette (Ctrl+Shift+P)

# Import file: data

data = np.loadtxt(file, delimiter='\t', dtype=str)
# Print the first element of data
print(data[0])
# Import data as floats and skip the first row: data_float
data_float=np.loadtxt(file, delimiter='\t', dtype=float, skiprows=1)

# Print the 10th element of data_float

print(data_float[9])

# Plot a scatterplot of the data

plt.scatter(data_float[:, 0], data_float[:, 1])
plt.xlabel('time (min.)')
plt.ylabel('percentage of larvae')
plt.show()

['Time' 'Percent']
[0. 0.357]

Due to the header, if you tried to import it as-is using np.loadtxt(), Python would throw you a ValueError and tell you that it could not
convert string to float. There are two ways to deal with this: firstly, you can set the data type argument dtype equal to str (for string).

data = np.loadtxt("file_name.txt", dtype=str)

Alternatively, you can skip the first row as we have seen before, using the skiprows argument.

data = np.loadtxt("file_name.txt", skiprows=1)

More Numpy
keyboard_arrow_down

[ ] subdirectory_arrow_right 30 cells hidden

https://colab.research.google.com/drive/1Sud91TIZDPxi87RZ37K8GON1_Yn4R38l#scrollTo=b5d22218&printMode=true 15/15

I222153 Lab03
No ratings yet
I222153 Lab03
28 pages
Practise
No ratings yet
Practise
9 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Codealpha Studentseda
No ratings yet
Codealpha Studentseda
2 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
22 pages
Student Demographics and Scores Analysis
No ratings yet
Student Demographics and Scores Analysis
6 pages
12 CS Ak Set-2
No ratings yet
12 CS Ak Set-2
8 pages
G 12 Model 2 Cs Ms-Pcbcs
No ratings yet
G 12 Model 2 Cs Ms-Pcbcs
6 pages
12th - Mid-Term-IP
No ratings yet
12th - Mid-Term-IP
5 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
CS Xii PB MS - Set1
No ratings yet
CS Xii PB MS - Set1
6 pages
BME303 Lab4 NinaSawaf
No ratings yet
BME303 Lab4 NinaSawaf
10 pages
Python Data Science Experiments Guide
No ratings yet
Python Data Science Experiments Guide
46 pages
Python Assignment (Sem 2) MBA
No ratings yet
Python Assignment (Sem 2) MBA
60 pages
R Record
No ratings yet
R Record
16 pages
Python Lab for Data Science Students
No ratings yet
Python Lab for Data Science Students
21 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Harjot 19 - 34 Python
No ratings yet
Harjot 19 - 34 Python
20 pages
Informatics Practices Practical
No ratings yet
Informatics Practices Practical
32 pages
CS Practical File
No ratings yet
CS Practical File
47 pages
Practical File (Xii - Ip Final)
No ratings yet
Practical File (Xii - Ip Final)
35 pages
Sample Paper 4 - AnswerKey
No ratings yet
Sample Paper 4 - AnswerKey
6 pages
Lucknow Public School - 20241201 - 220143 - 0000
No ratings yet
Lucknow Public School - 20241201 - 220143 - 0000
44 pages
Pandas Questions Ip File
No ratings yet
Pandas Questions Ip File
13 pages
Python Programming BTech Lab Manual
No ratings yet
Python Programming BTech Lab Manual
18 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
Ip Holiday Homework
No ratings yet
Ip Holiday Homework
40 pages
Experiment 8 & 9
No ratings yet
Experiment 8 & 9
14 pages
Class XII Computer Science Key
No ratings yet
Class XII Computer Science Key
6 pages
XII - Computer Science - SetC - MS - Mod
No ratings yet
XII - Computer Science - SetC - MS - Mod
9 pages
Priyanka Final Project
No ratings yet
Priyanka Final Project
71 pages
Sakthi Py Lab1
No ratings yet
Sakthi Py Lab1
63 pages
XII-IP Practical File 1-16 2022-23
No ratings yet
XII-IP Practical File 1-16 2022-23
23 pages
Xii - CS - WC - MS - Set 2
No ratings yet
Xii - CS - WC - MS - Set 2
5 pages
Worksheet Topic: Data File Handling in Python CSV Files
No ratings yet
Worksheet Topic: Data File Handling in Python CSV Files
4 pages
MS-Computer Science-12-Common Exam
No ratings yet
MS-Computer Science-12-Common Exam
9 pages
Rufh 4
No ratings yet
Rufh 4
24 pages
Solvedanswer
No ratings yet
Solvedanswer
73 pages
Class Xii Computer Science Final Mock Test Paper Ms
No ratings yet
Class Xii Computer Science Final Mock Test Paper Ms
6 pages
Microsoft Word - MS - PB-2 - XII - CS - 2024-25-SET I
No ratings yet
Microsoft Word - MS - PB-2 - XII - CS - 2024-25-SET I
10 pages
CS File
No ratings yet
CS File
31 pages
Python Basics Review for Teachers
No ratings yet
Python Basics Review for Teachers
72 pages
X-Ai Programme List With Solutions
No ratings yet
X-Ai Programme List With Solutions
9 pages
Set-1 Xii CS Ii PB MS 2022-23
No ratings yet
Set-1 Xii CS Ii PB MS 2022-23
7 pages
Practical 2
No ratings yet
Practical 2
23 pages
Ms Csxii pb1 Set3
No ratings yet
Ms Csxii pb1 Set3
9 pages
Week3 LA
No ratings yet
Week3 LA
3 pages
Class 12 Cs Ms 3rd Preboard
No ratings yet
Class 12 Cs Ms 3rd Preboard
5 pages
12 CS MADURAI SAHO SET 1 MS - New
No ratings yet
12 CS MADURAI SAHO SET 1 MS - New
12 pages
ZF D5 L Qa BPC Fohk JD
No ratings yet
ZF D5 L Qa BPC Fohk JD
2 pages
Project Report Format
No ratings yet
Project Report Format
18 pages
Practical 12th CS
No ratings yet
Practical 12th CS
18 pages
Class Xii Computer Science Mock Test Paper 02 Ms
No ratings yet
Class Xii Computer Science Mock Test Paper 02 Ms
7 pages
Python Programs & SQL Queries Guide
No ratings yet
Python Programs & SQL Queries Guide
7 pages
Quiz Coding Question 1
No ratings yet
Quiz Coding Question 1
9 pages
Py Midend Ans
No ratings yet
Py Midend Ans
21 pages
3D Studio Max Office Furniture Tutorial
No ratings yet
3D Studio Max Office Furniture Tutorial
7 pages
The Law and Practice of The International Criminal Court 1st Edition Carsten Stahn Download Full Chapters
100% (5)
The Law and Practice of The International Criminal Court 1st Edition Carsten Stahn Download Full Chapters
324 pages
Spot Day Trading Strategy + Basic Guide To Trading
100% (1)
Spot Day Trading Strategy + Basic Guide To Trading
6 pages
Understanding Urbanization Dynamics
No ratings yet
Understanding Urbanization Dynamics
6 pages
Introduction To Blockchain
No ratings yet
Introduction To Blockchain
22 pages
Letters To Sam
No ratings yet
Letters To Sam
15 pages
Valve Magazine, Winter 2021
No ratings yet
Valve Magazine, Winter 2021
44 pages
Class 7 Adjectives Practice Test
No ratings yet
Class 7 Adjectives Practice Test
3 pages
James Franco Thesis Film
100% (3)
James Franco Thesis Film
6 pages
Possible Chapter Questions
No ratings yet
Possible Chapter Questions
6 pages
Unbrako Fasteners Catalogue Overview
No ratings yet
Unbrako Fasteners Catalogue Overview
48 pages
Engineering Geology Course Plan 2018
No ratings yet
Engineering Geology Course Plan 2018
5 pages
Chromatic Circles
No ratings yet
Chromatic Circles
3 pages
January 2020 Mark Scheme
No ratings yet
January 2020 Mark Scheme
16 pages
Fiber Optic Production Report
No ratings yet
Fiber Optic Production Report
18 pages
Principles of Anatomy and Physiology 14th Edition Tortora Fast Access
No ratings yet
Principles of Anatomy and Physiology 14th Edition Tortora Fast Access
310 pages
Building Global Democracy Civil Society and Accountable Global Governance 1st Edition Jan Aart Scholte Digital Download
No ratings yet
Building Global Democracy Civil Society and Accountable Global Governance 1st Edition Jan Aart Scholte Digital Download
140 pages
Fin Tab 10241362 106663971
No ratings yet
Fin Tab 10241362 106663971
1 page
Building Ethical Organizations
No ratings yet
Building Ethical Organizations
14 pages
Dmitriy Pustovalov: Sales Professional Profile
No ratings yet
Dmitriy Pustovalov: Sales Professional Profile
2 pages
Geo P2 MS
No ratings yet
Geo P2 MS
12 pages
1 Service and Maintenance PDF
100% (1)
1 Service and Maintenance PDF
33 pages
Roof Beam Design Calculation
No ratings yet
Roof Beam Design Calculation
11 pages
BGA Foundation Example
No ratings yet
BGA Foundation Example
1 page
Loan Eligibility ML Project Report
No ratings yet
Loan Eligibility ML Project Report
28 pages
Practice Questions Leadership PDF
No ratings yet
Practice Questions Leadership PDF
8 pages
RV 10
No ratings yet
RV 10
111 pages
Signal Weighting by Richard Grinold
100% (1)
Signal Weighting by Richard Grinold
11 pages
(2021) (TS10 Chuyên Anh Hà N I) Suggested Key
No ratings yet
(2021) (TS10 Chuyên Anh Hà N I) Suggested Key
3 pages
GM-3 - Design of A Manual Plastic Brick Interlocking Machine
No ratings yet
GM-3 - Design of A Manual Plastic Brick Interlocking Machine
5 pages