0% found this document useful (0 votes)

42 views45 pages

Python for Data Science: ML Basics

The document is an introduction to Python for data science, focusing on machine learning concepts and model building. It covers various machine learning models, including simple linear regression and decision trees, along with data preprocessing, feature selection, and handling categorical data. Additionally, it discusses the importance of training and testing datasets, model evaluation, and introduces ensemble models like Random Forest.

Uploaded by

Felix Andoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views45 pages

Python for Data Science: ML Basics

Uploaded by

Felix Andoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

INTRODUCTION TO PYTHON FOR DATA SCIENCE

Pasty Asamoah
+233 (0) 546 116 102
[email protected]

Kwame Nkrumah University of Science and Technology

School of Business
Supply Chain and Information Systems Dept.
Images used in this presentation are sourced from various online platforms. Credit goes to the
respective creators and owners. I apologize for any omission in attribution, and appreciate the
work of the original content creators.
INTRODUCTION TO MACHINE LEARNING
MACHINE LEARNING
Machine learning is a field of AI that involves the development of
algorithms and statistical models that enable computers to learn and
improve their performance on a specific task without being explicitly
programmed.

learns learns from labeled

from data
unlabele
d data

learns to make
decisions by
interacting with
an environment
MACHINE LEARNING MODELS
Machine learning models can range from simple linear regression to
complex deep neural networks.

Simple linear
regression
SIMPLE LINEAR REGRESSION MODEL

Data preprocessing Build Model Evaluate

Clean data
Select model Check accuracy
Split data
OUR FIRST MACHINE LEARNING
MODEL

Snapshot of the
housing dataset
DATA INGESTION

Import
packages

Load data
DATA CLEANING

Handle duplicates

There are no
missing values
DATA CLEANING

Column data
types

We will be
working with
the integer data
types at this
stage.
FEATURE SELECTION
Predictors

What we want
to predict
MODEL SELECTION

Define: What type of model will it be? A decision tree?

Some other type of model? Some other parameters of
the model type are specified too.

Fit: Capture patterns from provided data. This is the

heart of modeling.

Predict: Just what it sounds like

Evaluate: Determine how accurate the model's

predictions are.

In this case we want to build a very

basic linear regression model using the
scikit learn library
Importing the
MODEL SELECTION linear regression
model

Create the model

Train the model

Importing the
MODEL SELECTION linear regression
model

Create the model

Train the model

We predict with a
MAKING PREDICTIONS set of predictors

The predictions
DECISION TREE
SIMPLE DECISION TREE MODEL

Data preprocessing Build Model Evaluate

Clean data
Select model Check accuracy
Split data
DECISION TREE MODEL
Machine learning models can range from simple linear regression to
complex deep neural networks.

Decision Tree
DECISION TREE Import decision tree from sklearn

model
Train model

Make predictions

Predicted VS
Actual are the
same. That is a
100% accuracy.
BUT WHY??
LETS MODIFY OUR MODEL BY
INTRODUCING TRAINING AND TEST
DATASETS

We realized that our model performed well with an

accuracy of 100%. This is unlikely in real-world
scenerios.

The reason for the 100% accuracy is that, we were

trying to predict Y values with X values that the model
has seen before. The model saw it in the Training Stage

What about testing our model on data that the model

has not seen before??

Let’s give it a shot!!!

INGESTION, CLEANING, AND
SELECTING VARIABLES
We import the
decision tree
model

Dependent Independent variable

variable
SPLIT DATA

The method for

splitting the data
SPLIT DATA

data 80% for training and 20%

for testing

Dataset for
training

Dataset for
testing
MODEL SELECTION

Train dataset
Test dataset
MODEL PERFORMANCE

Checks error
margin

Error margin
LETS MODIFY THE MODEL A BIT BY
SPECIFYING LEAVES

Error margin before updating parameter

Error margin after updating

parameter
PROBLEM OF UNDERFITTING AND
OVERFITTING
DIFFERENT LEVELS OF LEAVES

Error margin is high for 50 leaves

HANDLING CATEGORICAL DATA
CATEGORICAL DATA
Have you realized that we couldn’t include these attributes in the model?
HANDLE CATEGORICAL COLUMNS

Label Encoder One-Hot-Encoder Dummies

LABEL ENCODERS

Importing LabelEncoder
LABEL ENCODERS’

Columns of interest. We
believe that these columns
predict house prices. We
need to convert them to
numerical forms
TRANSFOMING CATEGORICAL
COLUMNS
Instantiate Label encoder Transform values Categorical column
to convert
ADD TRANSFORMED COLUMNS TO
DATAFRAME

New column name Transformed values

ADD TRANSFORMED COLUMNS TO
DATAFRAME

New column name Transformed values

SNAPSHOT OF TRANSFORMED
COLUMNS
New columns added
INDEPENDENT & DEPENDENT
VARIABLES
Select columns based on data types. Drop the price column. By default, it will be included because
Exclude columns with data type we are selecting all columns other than objects.
object
DUMMIES columns
Pandas method to handle
categorical columns

Note that it create multiple columns for each of them

based on the number of unique values in the column
DUMMIES columns
Pandas method to handle
categorical columns

Note that it create multiple columns for each of them

based on the number of unique values in the column
INDEPENDENT & DEPENDENT
VARIABLES
Select columns based on data types. Drop the price column. By default, it will be included because
Exclude columns with data type we are selecting all columns other than objects.
object
Task 1: Build a model with either linear
regression or decision tree and report
on the best model. Remember to apply
all skills and knowledge you have
acquired especially splitting data set
into training and testing, and encoding
categorical columns
ENSEMBLE MODELS
RANDOM FOREST MODEL
Ensemble models combine multiple individual models to
improve predictive performance. A popular ensemble method is
RandomForest, but there are others like Gradient Boosting and
AdaBoost.
ANY QUESTIONS??

Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
House Price Prediction Using Machine Learning: Presented By: Eram Fatma Salma Khatoon
No ratings yet
House Price Prediction Using Machine Learning: Presented By: Eram Fatma Salma Khatoon
9 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
24 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Data Science and Machine Learning - Interview Questions
No ratings yet
Data Science and Machine Learning - Interview Questions
185 pages
Regression Pipeline in Machine Learning
No ratings yet
Regression Pipeline in Machine Learning
58 pages
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
No ratings yet
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
9 pages
ML 2
No ratings yet
ML 2
39 pages
08 CSE358 Intro To Machine Learning II
No ratings yet
08 CSE358 Intro To Machine Learning II
100 pages
Introduction To Machine Learning With Python
No ratings yet
Introduction To Machine Learning With Python
2 pages
7 محاضرات
No ratings yet
7 محاضرات
36 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Big Mart Sales Prediction Using ML
No ratings yet
Big Mart Sales Prediction Using ML
18 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
Scikit-Learn Overview and Algorithms
100% (2)
Scikit-Learn Overview and Algorithms
12 pages
4 Data Preprocessing
No ratings yet
4 Data Preprocessing
27 pages
Machine Learning
100% (5)
Machine Learning
46 pages
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
Week-7 DS Practical
No ratings yet
Week-7 DS Practical
8 pages
Day 2 Presentation
No ratings yet
Day 2 Presentation
65 pages
Supervised ML with Flask & Docker
No ratings yet
Supervised ML with Flask & Docker
30 pages
Machine Learning Laboratory: Manual
No ratings yet
Machine Learning Laboratory: Manual
52 pages
Essential Python for Machine Learning
No ratings yet
Essential Python for Machine Learning
53 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
8 pages
ML Combined
No ratings yet
ML Combined
254 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
ML 01 (Shubham)
No ratings yet
ML 01 (Shubham)
14 pages
Machine Learning Algorithm With Python Implementation
No ratings yet
Machine Learning Algorithm With Python Implementation
34 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Multiple Linear Regression 3
No ratings yet
Multiple Linear Regression 3
68 pages
ML Report 1
No ratings yet
ML Report 1
23 pages
Unit 1
No ratings yet
Unit 1
28 pages
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
No ratings yet
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
119 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Machine Learning Basics for Students
No ratings yet
Machine Learning Basics for Students
25 pages
ML Adv
No ratings yet
ML Adv
51 pages
1 - Lab Manual (ML)
No ratings yet
1 - Lab Manual (ML)
42 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Practical Machine Learning Code Examples
No ratings yet
Practical Machine Learning Code Examples
33 pages
ML Unit I Data Preprocessing &unit IV Cost Function and Unit V Pruning Topic
No ratings yet
ML Unit I Data Preprocessing &unit IV Cost Function and Unit V Pruning Topic
11 pages
Digital Business Models
100% (1)
Digital Business Models
32 pages
Emotional Intelligence and Finance Decision Making - KNUST Biz SCH
No ratings yet
Emotional Intelligence and Finance Decision Making - KNUST Biz SCH
31 pages
Term Paper Assignment Developing or Enhancing A Business Model Canvas
No ratings yet
Term Paper Assignment Developing or Enhancing A Business Model Canvas
3 pages
MCS-472 - 24 Fanchising
No ratings yet
MCS-472 - 24 Fanchising
33 pages
Exp and Imp in Ghana
No ratings yet
Exp and Imp in Ghana
25 pages
PureMaths 2012 Paper1 Marking Scheme
No ratings yet
PureMaths 2012 Paper1 Marking Scheme
14 pages
Experiment Instruction of Proximate Analysis
No ratings yet
Experiment Instruction of Proximate Analysis
7 pages
Correspondence with Mrs. Galicia
No ratings yet
Correspondence with Mrs. Galicia
10 pages
Calculate Your Acoustics
No ratings yet
Calculate Your Acoustics
3 pages
Personalized and Adaptive Learning Educational Pra
No ratings yet
Personalized and Adaptive Learning Educational Pra
11 pages
Cremorne Point Circuit (Nsw-Cremorner-Cpc)
No ratings yet
Cremorne Point Circuit (Nsw-Cremorner-Cpc)
5 pages
Alkanes Chemistry
No ratings yet
Alkanes Chemistry
13 pages
Indian Aviation Industry Overview
50% (2)
Indian Aviation Industry Overview
54 pages
Prayer Book
No ratings yet
Prayer Book
74 pages
9 Marciak
No ratings yet
9 Marciak
11 pages
Firesafe 3-Piece Ball Valves F44
No ratings yet
Firesafe 3-Piece Ball Valves F44
4 pages
Dragon School
No ratings yet
Dragon School
17 pages
Architecture of Cape Verde - Wikiwand
No ratings yet
Architecture of Cape Verde - Wikiwand
11 pages
BSD Assignment 2-Solutions
No ratings yet
BSD Assignment 2-Solutions
2 pages
Primary Homework Help Day and Night
100% (1)
Primary Homework Help Day and Night
5 pages
001 PDF
No ratings yet
001 PDF
2 pages
DLP Trends Week 8
No ratings yet
DLP Trends Week 8
5 pages
DH Universal Sidecar
No ratings yet
DH Universal Sidecar
1 page
Definity: Enterprise Communications Server
No ratings yet
Definity: Enterprise Communications Server
2,350 pages
Plastic Moment of Resistance
No ratings yet
Plastic Moment of Resistance
5 pages
1st Round Closing Score
No ratings yet
1st Round Closing Score
8 pages
Urban Planning for Community Growth
No ratings yet
Urban Planning for Community Growth
15 pages
Dga - Ariel-P-Ip67 - Datasheet (Option 2 Spike)
No ratings yet
Dga - Ariel-P-Ip67 - Datasheet (Option 2 Spike)
3 pages
Pranay Report-1
No ratings yet
Pranay Report-1
36 pages
Power System Frequency Control Q&A
No ratings yet
Power System Frequency Control Q&A
4 pages
Solar System Wiring and Specs
No ratings yet
Solar System Wiring and Specs
2 pages
Payment Application Procedure Contractor S Perpective 1722189245
No ratings yet
Payment Application Procedure Contractor S Perpective 1722189245
6 pages
CVT PDF
No ratings yet
CVT PDF
194 pages
Learning About Herbivores, Carnivores, and Omnivores - Wayground
No ratings yet
Learning About Herbivores, Carnivores, and Omnivores - Wayground
5 pages
VHDL Neural Networks for Test Generation
No ratings yet
VHDL Neural Networks for Test Generation
11 pages

Python for Data Science: ML Basics

Uploaded by

Python for Data Science: ML Basics

Uploaded by

INTRODUCTION TO PYTHON FOR DATA SCIENCE

Kwame Nkrumah University of Science and Technology

learns learns from labeled

Data preprocessing Build Model Evaluate

Define: What type of model will it be? A decision tree?

Fit: Capture patterns from provided data. This is the

Predict: Just what it sounds like

Evaluate: Determine how accurate the model's

In this case we want to build a very

Create the model

Train the model

Create the model

Train the model

Data preprocessing Build Model Evaluate

We realized that our model performed well with an

The reason for the 100% accuracy is that, we were

What about testing our model on data that the model

Let’s give it a shot!!!

Dependent Independent variable

The method for

data 80% for training and 20%

Error margin before updating parameter

Error margin after updating

Error margin is high for 50 leaves

Label Encoder One-Hot-Encoder Dummies

New column name Transformed values

New column name Transformed values

Note that it create multiple columns for each of them

Note that it create multiple columns for each of them

You might also like