0% found this document useful (0 votes)

15 views5 pages

Chapter 4

Chapter 4 covers model selection and training in machine learning, focusing on classification and regression models. It outlines the steps for solving classification problems using the Iris dataset and regression problems using the Boston housing dataset, including model training and evaluation metrics. The chapter concludes with a summary of key concepts and homework suggestions for further practice.

Uploaded by

adiqbal002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

Chapter 4

Uploaded by

adiqbal002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Chapter 4

Model Selection and Training

Machine learning is all about creating models that can learn patterns from data and make
predictions. Model selection refers to choosing which type of machine learning model
(algorithm) to use for a given problem, while training refers to teaching that model using data.

We’ll cover:

1. What types of models exist.

2. How to choose a model.
3. How to train a model.
4. How to evaluate the model's performance.

4.1. Types of Machine Learning Models

1. Classification Models (For Categorical Data)

 Goal: Predict a category or class.

 Example: Predicting whether an email is "spam" or "not spam."
 Target variable: Categorical (e.g., yes/no, cat/dog, etc.).

2. Regression Models (For Continuous Data)

 Goal: Predict a continuous value.

 Example: Predicting the price of a house based on features like area, number of rooms,
etc.
 Target variable: Continuous (e.g., 1000, 2000, 2500, etc.).

4.2. Understanding a Simple Classification Example

Let’s start with a classification problem. We’ll use the Iris dataset, which is a classic dataset in
machine learning. It contains 150 data points, each describing an iris flower with 4 features
(measurements of the flowers). The goal is to predict the species of the flower based on those 4
features.

What’s in the Iris Dataset?

 Features (X): Sepal length, Sepal width, Petal length, Petal width.
 Target (y): The species of the iris flower, which can be either Setosa, Versicolor, or
Virginica.

Why Use Classification?

Since we are trying to predict a category (the species of the iris), this is a classification problem.

Steps to Solve the Classification Problem:

1. Load and Prepare the Data: We load the dataset and separate the data into features (X)
and target (y).
2. Split the Data: We divide the data into a training set and a testing set. The training set
is used to train the model, while the testing set is used to evaluate its performance.
3. Choose a Model: For simplicity, we will use Logistic Regression (a basic but effective
classification algorithm).
4. Train the Model: We train the model on the training data.
5. Evaluate the Model: After the model is trained, we test it on the testing data to check
how well it performs.

Code Example for Classification (Iris Dataset):

# Import necessary libraries

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset

iris = load_iris()
X = iris.data # Features: Sepal length, Sepal width, Petal length, Petal
width
y = iris.target # Target: Species (Setosa, Versicolor, Virginica)

# Step 2: Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Step 3: Initialize and train the Logistic Regression model

model = LogisticRegression(max_iter=200) # Set max_iter to 200 for better
convergence
model.fit(X_train, y_train) # Train the model on the training data

# Step 4: Make predictions on the testing set

y_pred = model.predict(X_test)

# Step 5: Evaluate the model's accuracy

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the Logistic Regression model: {accuracy * 100:.2f}%")

Explanation:
1. load_iris() loads the Iris dataset.
2. train_test_split() splits the data into training and testing sets.
3. LogisticRegression() is the model we’re using for classification. We train it with the
fit() method.
4. predict() is used to make predictions on the test data.
5. accuracy_score() calculates the accuracy of our model, which tells us how often the
model's predictions match the actual values.

4.3. Regression Example: Predicting House Prices

Now let’s talk about regression, where the target variable is continuous (e.g., predicting the
price of a house).

What’s in the Boston Housing Dataset?

The Boston housing dataset contains information about the housing prices in Boston. The goal
is to predict the price of a house based on its features (like the number of rooms, location, etc.).

Steps to Solve the Regression Problem:

1. Load and Prepare the Data: We load the dataset and separate the data into features (X)
and target (y) (house prices).
2. Split the Data: We divide the data into training and testing sets.
3. Choose a Model: We use Linear Regression, which tries to find a line that best fits the
data.
4. Train the Model: We train the model on the training data.
5. Evaluate the Model: After training, we evaluate the model’s performance using Mean
Squared Error (MSE).

Code Example for Regression (House Price Prediction):

# Import necessary libraries

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Step 1: Load the Boston housing dataset

boston = load_boston()
X = boston.data # Features: e.g., number of rooms, crime rate, etc.
y = boston.target # Target: House prices

# Step 2: Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Step 3: Initialize and train the Linear Regression model

model = LinearRegression()
model.fit(X_train, y_train) # Train the model on the training data

# Step 4: Make predictions on the testing set

y_pred = model.predict(X_test)

# Step 5: Evaluate the model using Mean Squared Error (MSE)

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error of the Linear Regression model: {mse:.2f}")

Explanation:

1. load_boston() loads the Boston housing dataset.

2. train_test_split() splits the data into training and testing sets.
3. LinearRegression() is the model we’re using for regression. We train it with the fit()
method.
4. predict() is used to make predictions on the test data.
5. mean_squared_error() calculates how far off the model’s predictions are from the
actual house prices. A smaller MSE indicates a better model.

4.4. Model Evaluation

For Classification (e.g., Logistic Regression):

 Accuracy: The percentage of correct predictions.

o Example: If the model correctly classifies 80 out of 100 flowers, the accuracy is
80%.

For Regression (e.g., Linear Regression):

 Mean Squared Error (MSE): Measures the average squared difference between the
predicted values and the actual values.
o Example: If the model predicts house prices that are off by a few thousand dollars
on average, MSE will quantify that error.

4.5. Summary of Model Selection and Training

Here’s a summary of what we covered in this chapter:

1. Classification vs. Regression:

o Classification is for predicting categories (e.g., Iris species).
o Regression is for predicting continuous values (e.g., house prices).
2. Steps to Train a Model:
o Load the dataset and separate features from the target variable.
o Split the data into training and testing sets.
o Choose a model: Select an appropriate model (e.g., Logistic Regression for
classification, Linear Regression for regression).
o Train the model: Use the .fit() method to train the model.
o Evaluate the model: Use metrics like accuracy (for classification) or Mean
Squared Error (for regression).

Homework / Practice for Chapter 4

1. Classification: Try the Iris dataset with other classification models like K-Nearest
Neighbors or Support Vector Machines.
2. Regression: Try the California housing dataset for predicting house prices.
3. Model Evaluation: Calculate the accuracy for classification models and MSE for
regression models on your own datasets.

Now, with this more detailed explanation, I hope it’s clearer! Do you feel more comfortable with
this chapter? Would you like to go through any specific part again, or is there anything else
you’d like to explore in more detail?

Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
Sales and Regression Data Generation
No ratings yet
Sales and Regression Data Generation
30 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
ML Internal Answers
No ratings yet
ML Internal Answers
9 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
ML Report 1
No ratings yet
ML Report 1
23 pages
Supervised Learning
No ratings yet
Supervised Learning
14 pages
Week-7 DS Practical
No ratings yet
Week-7 DS Practical
8 pages
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
No ratings yet
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
9 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Machine Learning Practical Exercises
100% (1)
Machine Learning Practical Exercises
12 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Regression Model Training Guide
No ratings yet
Regression Model Training Guide
13 pages
Hemraj Python Ass1
No ratings yet
Hemraj Python Ass1
7 pages
Lecture - 4 - Logistic Regression
No ratings yet
Lecture - 4 - Logistic Regression
62 pages
Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
1 - Lab Manual (ML)
No ratings yet
1 - Lab Manual (ML)
42 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Iris Dataset EDA & ML Techniques
100% (2)
Iris Dataset EDA & ML Techniques
24 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
24 pages
PythonForML2023 Laboratory07 08 Regression Classification Update2
No ratings yet
PythonForML2023 Laboratory07 08 Regression Classification Update2
6 pages
MLA Manual
No ratings yet
MLA Manual
25 pages
Day 45 All Machine Learning Algorithms With Code When To Use Each
No ratings yet
Day 45 All Machine Learning Algorithms With Code When To Use Each
67 pages
CB Lab 221801017
No ratings yet
CB Lab 221801017
33 pages
DTS 101 Lecture 4
No ratings yet
DTS 101 Lecture 4
27 pages
ML Week 4
No ratings yet
ML Week 4
5 pages
Machine Learning: Lecture 7: Create Your First Project
No ratings yet
Machine Learning: Lecture 7: Create Your First Project
17 pages
Python for Data Science: ML Basics
No ratings yet
Python for Data Science: ML Basics
45 pages
Types of ML Systems
No ratings yet
Types of ML Systems
5 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
8 pages
ML Record
No ratings yet
ML Record
19 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
Machine Learning Laboratory Exercises
No ratings yet
Machine Learning Laboratory Exercises
16 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
08 CSE358 Intro To Machine Learning II
No ratings yet
08 CSE358 Intro To Machine Learning II
100 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Machine Learning Techniques on Iris Dataset
No ratings yet
Machine Learning Techniques on Iris Dataset
8 pages
Predicting House Prices
No ratings yet
Predicting House Prices
9 pages
ML 01 (Shubham)
No ratings yet
ML 01 (Shubham)
14 pages
SDL Unit 1
No ratings yet
SDL Unit 1
7 pages
Chapter 03 - 1731422626
No ratings yet
Chapter 03 - 1731422626
42 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
FIND-S Algorithm Implementation
No ratings yet
FIND-S Algorithm Implementation
51 pages
ML Record
No ratings yet
ML Record
21 pages
Combine PDF
No ratings yet
Combine PDF
75 pages
Data Mining Final Assignment
No ratings yet
Data Mining Final Assignment
4 pages
House Price Prediction Using Machine Learning: Presented By: Eram Fatma Salma Khatoon
No ratings yet
House Price Prediction Using Machine Learning: Presented By: Eram Fatma Salma Khatoon
9 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
INSY446 - 4 - Classification Part 1
No ratings yet
INSY446 - 4 - Classification Part 1
26 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
55 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
ML101 C&a
No ratings yet
ML101 C&a
33 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
28 pages
Part&Labour 92299549 Invoice
No ratings yet
Part&Labour 92299549 Invoice
3 pages
C++ MODULE
No ratings yet
C++ MODULE
10 pages
Kanban Tutorial
100% (3)
Kanban Tutorial
29 pages
001 - Research in Mechanical Engineering Design
No ratings yet
001 - Research in Mechanical Engineering Design
18 pages
Mastering ChatGPT: Effective Usage Guide
No ratings yet
Mastering ChatGPT: Effective Usage Guide
10 pages
Read A Text File With VBA in Excel, and Write The Text To A Spreadsheet
No ratings yet
Read A Text File With VBA in Excel, and Write The Text To A Spreadsheet
4 pages
N-Channel JFET Switching Specifications
No ratings yet
N-Channel JFET Switching Specifications
9 pages
p7211 e PDF
No ratings yet
p7211 e PDF
4 pages
VLT AutomationDrive FC 301 302 DG M00190 01
No ratings yet
VLT AutomationDrive FC 301 302 DG M00190 01
264 pages
Understanding Stack in Assembly
No ratings yet
Understanding Stack in Assembly
26 pages
Backup: (Note 1)
No ratings yet
Backup: (Note 1)
20 pages
Driving
No ratings yet
Driving
2 pages
Project Report
No ratings yet
Project Report
23 pages
South Share Market
No ratings yet
South Share Market
12 pages
Build & Release Engineer Profile
No ratings yet
Build & Release Engineer Profile
4 pages
Number Systems and IP Addressing Overview
No ratings yet
Number Systems and IP Addressing Overview
54 pages
f01 Training Activity Matrix
100% (1)
f01 Training Activity Matrix
2 pages
Datasheet TM104SDH01
No ratings yet
Datasheet TM104SDH01
20 pages
Slides 0596 Trey
No ratings yet
Slides 0596 Trey
21 pages
TTS For Punjabi (Theisis) PDF
No ratings yet
TTS For Punjabi (Theisis) PDF
61 pages
Rishabh Choudhary Resume
No ratings yet
Rishabh Choudhary Resume
1 page
HP m612 - Part List
No ratings yet
HP m612 - Part List
1 page
Understanding Generations Z and Y
No ratings yet
Understanding Generations Z and Y
22 pages
Res2Dinvx64: With Multi-Core and 64-Bit Support For Windows Xp/Vista/7/8
No ratings yet
Res2Dinvx64: With Multi-Core and 64-Bit Support For Windows Xp/Vista/7/8
13 pages
Transform Calculus and Numerical Techniques
No ratings yet
Transform Calculus and Numerical Techniques
1 page
IP Quality of Service PDF
No ratings yet
IP Quality of Service PDF
368 pages
Cadworx & Analysis Solutions: Intergraph
No ratings yet
Cadworx & Analysis Solutions: Intergraph
8 pages
Huma Count 80ts Hematology Analyzers
No ratings yet
Huma Count 80ts Hematology Analyzers
6 pages
Final Thesis
No ratings yet
Final Thesis
5 pages
Grade 5
No ratings yet
Grade 5
5 pages

Chapter 4

Uploaded by

Chapter 4

Uploaded by

Chapter 4

Model Selection and Training

1. What types of models exist.

4.1. Types of Machine Learning Models

1. Classification Models (For Categorical Data)

 Goal: Predict a category or class.

2. Regression Models (For Continuous Data)

 Goal: Predict a continuous value.

4.2. Understanding a Simple Classification Example

What’s in the Iris Dataset?

Why Use Classification?

Steps to Solve the Classification Problem:

Code Example for Classification (Iris Dataset):

# Import necessary libraries

# Step 1: Load the Iris dataset

# Step 3: Initialize and train the Logistic Regression model

# Step 4: Make predictions on the testing set

# Step 5: Evaluate the model's accuracy

4.3. Regression Example: Predicting House Prices

What’s in the Boston Housing Dataset?

Steps to Solve the Regression Problem:

Code Example for Regression (House Price Prediction):

# Import necessary libraries

# Step 1: Load the Boston housing dataset

# Step 3: Initialize and train the Linear Regression model

# Step 4: Make predictions on the testing set

# Step 5: Evaluate the model using Mean Squared Error (MSE)

1. load_boston() loads the Boston housing dataset.

4.4. Model Evaluation

For Classification (e.g., Logistic Regression):

 Accuracy: The percentage of correct predictions.

For Regression (e.g., Linear Regression):

4.5. Summary of Model Selection and Training

Here’s a summary of what we covered in this chapter:

1. Classification vs. Regression:

Homework / Practice for Chapter 4

You might also like