Fundamentals of Machine Learning
Hakim Hafidi & Youness Moukafih
Lab 1: Introduction to Python Libraries -
Pandas, NumPy, and Matplotlib
Objective:
This lab aims to introduce you to three fundamental Python libraries, Pandas, NumPy,
and Matplotlib, used in data analysis. By the end of this lab, you should be able to load a
dataset, perform basic operations, and create visualizations to understand the
relationships between different variables in the dataset.
Prerequisites:
• Basic knowledge of Python programming language.
• Anaconda installed on your computer.
Step 1: Installing Anaconda
If you haven't installed Anaconda yet, please follow the instructions below:
1. Download Anaconda from Anaconda Individual Edition.
2. Follow the installation instructions for your operating system: Anaconda
Installation Guide.
Step 2: Setting Up Jupyter Notebook
1. Open Anaconda Navigator.
2. Launch Jupyter Notebook.
3. Create a new Python notebook.
1
Fundamentals of Machine Learning
Introduction to Pandas, NumPy, and Matplotlib
Pandas
Pandas is a powerful library for data analysis and manipulation.
# Importing Pandas Library
import pandas as pd
NumPy
NumPy supports large, multi-dimensional arrays and matrices and mathematical
functions to operate on these arrays.
# Importing NumPy Library
import numpy as np
Matplotlib
Matplotlib is a plotting library for creating static, animated, and interactive visualizations
in Python.
# Importing Matplotlib Library
import matplotlib.pyplot as plt
Lab Tasks:
Task 1: Load a Dataset
Load the 'Iris' dataset from the UCI Machine Learning Repository.
# Loading Iris Dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.dat
a"
column_names = ["sepal_length", "sepal_width", "petal_length", "petal_width",
"class"]
iris = pd.read_csv(url, names=column_names)
2
Fundamentals of Machine Learning
Task 2: View the Dataset
View the first 5 rows of the dataset to understand the data.
# Viewing first 5 rows of Iris Dataset
iris.head()
Task 3: Basic Operations
Calculate the average, median, and standard deviation of the 'sepal_length' column.
# Calculating the average of 'sepal_length'
average_sepal_length = iris['sepal_length'].mean()
print(f"Average Sepal Length: {average_sepal_length}")
# Calculating the median of 'sepal_length'
median_sepal_length = iris['sepal_length'].median()
print(f"Median Sepal Length: {median_sepal_length}")
# Calculating the standard deviation of 'sepal_length'
std_dev_sepal_length = iris['sepal_length'].std()
print(f"Standard Deviation of Sepal Length: {std_dev_sepal_length}")
Task 4: Data Visualization
Create scatter plots to visualize the relationships between 'sepal_length' and
'sepal_width', and between 'petal_length' and 'petal_width'.
# Creating Scatter Plot for 'sepal_length' and 'sepal_width'
plt.scatter(iris['sepal_length'], iris['sepal_width'])
plt.title('Sepal Length vs Sepal Width')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()
# Creating Scatter Plot for 'petal_length' and 'petal_width'
plt.scatter(iris['petal_length'], iris['petal_width'])
plt.title('Petal Length vs Petal Width')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.show()
3
Fundamentals of Machine Learning
Enhanced Visualization and Analysis Tasks:
Task 5: Correlation Matrix
Create a correlation matrix to understand the linear relationship between the different
variables in the dataset.
# Creating Correlation Matrix
correlation_matrix = iris.corr()
print(correlation_matrix)
Task 6: Scatter Plot Matrix
Create a scatter plot matrix to visualize the relationships between all pairs of variables.
# Creating Scatter Plot Matrix
pd.plotting.scatter_matrix(iris, alpha=0.8, figsize=(10, 10), diagonal='hist')
plt.show()
Exercises:
1. Exercise 1: Analyze the correlation matrix and scatter plot matrix. Answer the
following questions: a. Is there a relationship between 'sepal_length' and
'sepal_width'? b. Is the relationship between 'petal_length' and 'petal_width'
positive or negative? c. Which pair of variables has the strongest relationship?
2. Exercise 2: Create a scatter plot for 'petal_length' and 'petal_width'. Based on
the plot, hypothesize whether there is any association between the two variables
and whether the association is positive or negative.
3. Exercise 3: Load another dataset of your choice and perform similar operations
and visualizations to understand the relationships between the variables. Answer
questions about the relationships between the variables based on the
visualizations.
Submission:
Submit the Jupyter notebook containing all the executed cells along with the outputs and
your answers to the exercise questions.