ML
TECHNIQUES
UNDER SUPERVISION OF
PROF.MANOJ KUMAR TIWARI,
DIRECTOR NITIE
INSTRUCTIONS TO USE GOOGLE COLAB
1. CREATING A NOTEBOOK
Enter the google colab website
or click here. To start coding, go
to the file section on the top left
corner and create a new
notebook
4. RUN CODE
2. AUTHENTICATION
Finally, type in your code in the
If colab isn't linked with your cell and press the play button
Google account, you will to run your code
receive a pop up to sign in
with your Google account
3. CONNECTING TO
NOTEBOOK
Connect the notebook in order
to run your code
INSTRUCTIONS TO USE GOOGLE COLAB TO RUN THE
ALGORITHMS
1. CONNECT YOUR GOOGLE
ACCOUNT
After opening our notebook by
clicking here, click on the
connect button present on the
top right corner and sign in to
your google account. After the
authentication, the notebook will
connect to your acount
2. LOAD DATASET
3. RUN ALGORITHMS
Progressing ahead, press the
Finally, press the play button
play button on the Dataset
on the last cell
Cell
DATA SET INFORMATION
CLASSIFICATION
CLASSIFICATION REGRESSION CLUSTERING
Wine-analytics Classification Wine-analytics Public Utilities (Existing)
(Existing) Advertisement(Existing) Customer Segmentation (New)
Wine-analytics Classification CarSales (New)
(Logistic Regression) (Existing) Metal Sales (New)
SUV_purchase (New)
Mobile Price
Classification(New)
Car Ownership
Classification(Existing)
CLASSIFICATION
CLASSIFICATION REGRESSION CLUSTERING
MODELS USED FOR CLASSIFICATION
Logistic Decision Tree K-NN
Regression
DATASET USED FOR CLASSIFICATION
New Data New Data
SUV Purchase Mobile Price
Forecasting Purchasing ability of SUVs. Predict price range indicating how high the price
The goal is fit a Classifier to the data and is
provide predictions for future customers. Feature: battery_power, mobile_wt, px_height,
Feature: Gender, Age, EstimatedSalary px_width, ram
Target: Purchased Target: price_range [0(low cost), 1(medium
cost), 2(high cost) and 3(very high cost)]
CLASSIFICATION REGRESSION CLUSTERING
PERFORMING
CLASSIFICATION
Simply enter the option number to
perform classification. Furthermore,
two more options are displayed to
choose the dataset and model of the
user's choice
CLASSIFICATION REGRESSION CLUSTERING
1. LOGISTIC REGRESSION
Dataset Description (wine_alytics(logistic))
CLASSIFICATION REGRESSION CLUSTERING
2. LOGISTIC REGRESSION
Data Visualization
wine_alytics(logistics) SUV_Purchase mobile_price
CLASSIFICATION REGRESSION CLUSTERING
2. LOGISTIC REGRESSION
For wine_alytics(logistic) Dataset
ROC-AUC CURVE Confusion Matrix
(entropy)
Train set
Test set
CLASSIFICATION REGRESSION CLUSTERING
2. LOGISTIC REGRESSION
For SUV_Purchase Dataset
ROC-AUC CURVE Confusion Matrix
(entropy)
Train set
Test set
CLASSIFICATION REGRESSION CLUSTERING
2. LOGISTIC REGRESSION
For mobile_price dataset
ROC-AUC CURVE
Confusion Matrix
(entropy) Train set
Test set
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE
CLASSIFIER
Dataset Description (wine_alytics)
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
Dataset - (wine_alytics)
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
For SUV_Purchase Dataset
Confusion Matrix (entropy)
Train set Test set
ROC-AUC CURVE
(entropy)
Confusion Matrix (gini)
Train set Test set
ROC-AUC CURVE
(gini)
ACCURACY (gini): ACCURACY (entropy):
Training set: 0.9187 Training set: 0.9187
Test set: 0.9125 Test set: 0.9125
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
For mobile_data Dataset
Confusion Matrix (entropy)
Train set Test set
ROC-AUC CURVE
(entropy)
Confusion Matrix (gini)
Train set Test set
ROC-AUC CURVE
(gini)
ACCURACY (gini): ACCURACY (entropy):
Training set: 0.7806 Training set: 0.7681
Test set: 0.7400 Test set: 0.7550
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
Data Visualization
SUV_Purchase mobile_price
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
For wine_alytics Dataset
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
For SUV_Purchase Dataset
CLASSIFICATION REGRESSION CLUSTERING
2. DECISION TREE CLASSIFIER
For mobile_data Dataset
Depth of tree = 2
Classifier = entropy
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
Each feature are Normalised so the features become range
independent.
SUV_PURCHASE
MOBILE PRICE CLASSIFICATION
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
Visualization of target class
SUV_Purchase mobile_price
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
PERFORMING
KNN
Enter the first option to choose a
specific number of neighbors to Enter the max k and the number of cross-validation sets to search
look in Knn-Algorithm or the best k within a range
subsequent one to choose the
best k
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
Best K
Best K
Max k =50, cross-validation set = 10
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
SUV_Purchase Dataset
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
Mobile_data Dataset
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
CAR OWNERSHIP CLASSIFICATION AFTER LABELING AND SCALING
CAR OWNERSHIP CLASSIFICATION
CLASSIFICATION REGRESSION CLUSTERING
3. K-NEAREST NEIGHBORS
CLASSIFICATION
CHOOSING MAX K AND NUMBER OF CROSS VALIDATIONS SETS TO BE CREATED TO FIND BEST K
Best K
CLASSIFICATION REGRESSION CLUSTERING
MODELS USED FOR REGRESSION
Linear
Decision Tree
Regression
DATASET USED FOR REGRESSION
Old Data New Data New Data
Advertising Car Sales Monthly Steel
Sales Consumption
Predicting first year sales from Predicting sales of cars Forecasting monthly steel
advertisement consumption
Multi feature regression
Single Feature regression or Time-series forecasting
Feature: Supplier name, Car
multivariate Feature: Month number, steel
Feature: Cost of Advertisements, model, Car model etc.
ids
Promotion expenditure and Target: Sales Target: Monthly consumption
Competitors' sales
Target: Sales
CLASSIFICATION REGRESSION CLUSTERING
PERFORMING
REGRESSION
Simply enter the option number to perform
regression. Furthermore, two more options
are displayed to choose the dataset and
model of the user's choice
CLASSIFICATION REGRESSION CLUSTERING
20,000
3
15,000
2
10,000
COMPARISON OF RMSE
VALUES
1
5,000
0 0 Advertisement Dataset: Since the features are already highly correlated
Linear Regressor Decision Tree Linear Regressor Decision Tree
with the target value, a simple linear regression model can easily fit a line
Advertisement Dataset Car Sales Dataset
on the single and multivariate featured dataset.
Car Sales Dataset: The Dataset contains many features, including discrete
400 categorical variables. Due to this Decision Tree filters out the best
predictions through its tree-like structure
300
Steel Consumption: This dataset is slightly complicated and the target
values cannot be plotted on a single hyperplane. Hence Decision tree
200
Steel Consumption outperforms Linear Regressor
100
Dataset
0 16
Linear Regressor Decision Tree
CLASSIFICATION REGRESSION CLUSTERING
ANALYSIS OF
CAR SALES DATA
Data Description
Displays Data
Encoding non numerical values respective to
mean of target values
CLASSIFICATION REGRESSION CLUSTERING
Correlation Matrix
Plot of correlation Matrix
CLASSIFICATION REGRESSION CLUSTERING
RESULTS FOR CAR SALES
USING LINEAR REGRESSION
16
CLASSIFICATION REGRESSION CLUSTERING
K-MEANS CLUSTERING
DATASET USED FOR CLUSTERING
Old Data New Data
Customer
Public Utilities
Segmentation
Dataset consisting 5 features
Data of 22 firms with 8 variables
Divide the customers up based on
are given
common characteristics such as
We have to find clusters of similar
demographics or behaviors,
public utilities
5 16
CLASSIFICATION REGRESSION CLUSTERING
Public Utilities
PCA has been used for dimensionality reduction.
For k=2 and k=3, clustering visualization could be seen below.
k=2 k=3
CLASSIFICATION REGRESSION CLUSTERING
Changes in clustering of
Public Utilities Dataset
Since the coordinates of final clusters
in K-means depend on their initial
positions, we have found a different
result than the one shown in the ppt
Our Result
CLASSIFICATION REGRESSION CLUSTERING
Mall customer Segmentation
PCA has been used for dimensionality reduction.
For k=3 and k=4, clustering visualization could be seen below.
k=3 k=4
THANK YOU
THE DATASETS CAN BE VIEWED HERE THE NOTEBOOK CAN BE VIEWED HERE