Unit-4 Part 2 Modelling and Evaluation

The document outlines various types of analytics including descriptive, diagnostic, predictive, and prescriptive analytics, each serving different purposes in data analysis. It also details the steps involved in developing a machine learning application, from data collection to model training and evaluation, emphasizing the importance of model testing and periodic revisits. Additionally, it discusses ensemble learning methods such as bagging and boosting, highlighting their differences and applications in improving model performance.

Uploaded by

harshlpatel.4274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views35 pages

Unit-4 Part 2 Modelling and Evaluation

Uploaded by

harshlpatel.4274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Modelling

and
Evaluation
Prof. Atmiya Patel
Content
1: Selecting a Model: Predictive/Descriptive

2: Training a Model for supervised learning

3: Model representation and

interpretability

4: Evaluating performance of a model

5: Improving performance of a model

1. Descriptive Analytics
 Descriptive analytics answers questions about events that have already
occurred.
 The raw data is queried without adding any contextual information. This is the
simplest form of analytics and typically answers questions such as
 How many units of a particular item sold in last 6 months?
 How many patients died of a particular cancer type?
 How many calls did you receive for a particular issue?
 This kind of analytics is usually done using database queries or simple
spreadsheet filters. You could have periodic dashboards and reports that can
be used to visualise results of the descriptive analytics.
2. Diagnostic Analytics
 Diagnostic Analytics is done to find out cause of a phenomenon or
derive reasoning behind events.
 This analytics goes a level deeper to provide information that can
be used to fix a particular situation or event.
 Diagnostic analytics usually adds more context to the data to get
information about a particular interest.
Cont…
 For example, following are a few questions that can be answered
using diagnostic analytics.
 Why the sales in quarter 2 lower than quarter 1?
 Why are people falling ill after eating a particular type of biscuits?
 Why the model X of the car preferable over the model Y of the car?
 Diagnostic analytics require careful examination of data from
multiple sources and is a little more involved and skilful exercise
than descriptive analytics.
3. Predictive Analytics
 Predictive analytics is carried out to forecast and predict future
events.
 The information is further enriched by adding meaning to it to
derive knowledge. The predictive data models are carefully
created that can base off future predictions based on the past
events.
Cont…
 Predictive analytics could possibly answer questions such as
 What would be the improved life expectancy if choosing medicine, A
over medicine B?
 What would be the sales figure for model X of the car in third quarter?
 Which team would likely win the world cup this year?
 Predictive analytics assumes that certain set of conditions are met
or would exist If there are changes to those conditions, then
predictive analytics may not be accurate.
4. Prescriptive Analytics
 Prescriptive analytics takes the results from
predictive analytics and further adds human
judgment to prescribe or advise further actions.
 This reflects the wisdom level from the DIKW
pyramid.
 The prescriptive analytics could answer questions
such as
 What should you do to delay cancer?
 What is the best time to leave home to reach airport
on time?
 Which medicine would have higher chances of
survival for the patient?
Training a Model for supervised
learning
 At a high-level, developing a machine learning application involves the
following steps. Each of the steps could be a sequence of activities in itself.
 Steps in Developing a Machine Learning Application
 1. Collect Data
 2. Prepare the input data
 3. Analyze the input data
 4. Train the algorithm
 5. Test the algorithm
 6. Use the algorithm
 7. Periodic Revist
1. Collect Data
 To build a machine learning model, it need a huge volume of training data. You
could possibly have this training dataset available internally from your historical
business operations or you could engage with external agencies and websites that
have training datasets for various purposes either freely available or available for a
fee.
 Kaggle Dataset: 54876 Dataset
 Amazon Web Services (AWS): 188 Dataset
 UCI repository:
 Google TensorFlow:
 Microsoft:
 Open ML: 3192 Dataset
2. Preparing the Input Data
 You need to ensure that it is in the right format such that it can be
processed by the chosen algorithm and computer programs.
 The publicly available datasets are usually available in various
formats so that you can skip this step and could directly use the
procured training dataset.
3. Analyze the Input Data
 This is a crucial step where you need to ensure that the input
dataset could be parsed properly for your chosen computer
program.
 You also need to ensure that the examples are complete (they are
not missing values) and are also not skewed (too high or too low
compared to rest of the examples). •
 If you trust the source of dataset and if you are sure that the
dataset has accurate values, you may choose to skip this step.
This step just ensures that the dataset, based on which you are
going to build your machine learning model, meet the desired
quality.
4. Train the Algorithm
 This is the core step where you start to train your machine learning algorithm to build
a model.
 Based on the chosen algorithm, this step could be simple or could be very complex.
 You use the collected and analyzed input training dataset and feed it to the chosen
algorithm to check how it works on the input data and adjusting/ correcting them as
required.
 Note here that in the case of unsupervised learning, there is no explicit training step
because you do not have a target value. Unsupervised learning algorithms work on
the provided input to find patterns. However, you may have to choose which features
would you choose to feed the unsupervised learning algorithm so that the discovered
patterns are meaningful for the purpose.
5. Test the Algorithm
 When you get a dataset, you partition it into 80-20%. where 80% of the
examples in the dataset are used to train the model and 20% of the
examples are used to test the model.
 When you are training a supervised learning algorithm, once you sufficiently
confidant then model is well trained, you put it to test by feeding it new
known inputs and confirming if it produces the desired output (since desired
output is also known from the fed input data).
 In unsupervised learning, you may have to use various evaluation
parameters, such as number of clusters created and distance between the
cluster objects, to ensure that the model is working as expected.
Cont…
 If the test results are promising you move further with using the
model. However, if the test results are not satisfactory, you need to
find the root cause and based on the root cause you may have to
 Re-train the model
 Make adjustments in the model or data
 Try a different algorithm
 Collect the dataset from a different source
 Testing the algorithm before use is a crucial step to ensuring that your
model does not product false results. Do not skip it.
6. Use the Algorithm
 You spent a lot of time collecting and cleaning the data and then
building and testing the model Once you are through these steps,
you are good to use the model.
 You may develop an application based out of the mode. For
example, based on someone's health parameters, your machine
learning model could deduce health related problems that the
person may face in near future.
 Based on someone's credit history, your application may infer the
chances of a new loan getting approved.
7. Periodic Revisit
 As you often revise to ensure that your learning is still effective, you
should periodically review the results that the model is producing and
evaluate if there are opportunities for improving it in light of new data.
 You may carry out minor adjustments to the model or may re-train it
with latest data to fine tune it.
 This step is very similar to you getting a master health check done for
yourself annually to ensure that your body's vital parameters are
doing well. If any parameter indicates a potential problem, then you
either make lifestyle changes or seek medical advice.
ROC curve
 An ROC curve (receiver operating characteristic curve) is a graph
showing the performance of a classification model at all
classification thresholds. This curve plots two parameters:
 True Positive Rate
 False Positive Rate
 An ROC curve plots TPR vs. FPR at different classification
thresholds. Lowering the classification threshold classifies more
items as positive, thus increasing both False Positives and True
Positives.
TP vs. FP rate at different classification thresholds
Cont…
 To compute the points in an ROC curve, we could evaluate a
logistic regression model many times with different classification
thresholds, but this would be inefficient. Fortunately, there's an
efficient, sorting-based algorithm that can provide this information
for us, called AUC.
AUC: Area Under the ROC Curve
 AUC stands for "Area under the ROC Curve." That is, AUC
measures the entire two-dimensional area underneath the entire
ROC curve (think integral calculus) from (0,0) to (1,1).
AUC (Area under the ROC Curve)
Cont…
 AUC provides an aggregate measure of performance across all
possible classification thresholds. One way of interpreting AUC is
as the probability that the model ranks a random positive example
more highly than a random negative example. For example, given
the following examples, which are arranged from left to right in
ascending order of logistic regression predictions:
 AUC represents the probability that a random positive (green)
example is positioned to the right of a random negative (red)
example.

Fig: Predictions ranked in ascending order of logistic regression score

Cont…
 AUC ranges in value from 0 to 1. A model whose predictions are
100% wrong has an AUC of 0.0; one whose predictions are 100%
correct has an AUC of 1.0.
 AUC is desirable for the following two reasons:
 AUC is scale-invariant. It measures how well predictions are ranked,
rather than their absolute values.
 AUC is classification-threshold-invariant. It measures the quality of the
model's predictions irrespective of what classification threshold is
chosen.
Cont…
 However, both these reasons come with caveats, which may limit the
usefulness of AUC in certain use cases:
 Scale invariance is not always desirable. For example, sometimes we really
do need well calibrated probability outputs, and AUC won’t tell us about that.
 Classification-threshold invariance is not always desirable. In cases where
there are wide disparities in the cost of false negatives vs. false positives, it
may be critical to minimize one type of classification error. For example,
when doing email spam detection, you likely want to prioritize minimizing
false positives (even if that results in a significant increase of false
negatives). AUC isn't a useful metric for this type of optimization.
Additional Classification Methods
 As we know, Ensemble learning helps improve machine learning
results by combining several models. This approach allows the
production of better predictive performance compared to a single
model.
 Basic idea is to learn a set of classifiers (experts) and to allow them
to vote. Bagging and Boosting are two types of Ensemble Learning.
 These two decrease the variance of a single estimate as they
combine several estimates from different models. So, the result may
be a model with higher stability.
Cont…
 Bagging: It is a homogeneous weak learners’ model that learns
from each other independently in parallel and combines them for
determining the model average.
 Boosting: It is also a homogeneous weak learners’ model but
works differently from Bagging. In this model, learners learn
sequentially and adaptively to improve model predictions of a
learning algorithm.
Bagging
 Bootstrap Aggregating, also known as bagging, is a machine
learning ensemble meta-algorithm designed to improve the
stability and accuracy of machine learning algorithms used in
statistical classification and regression. It decreases the
variance and helps to avoid overfitting. It is usually applied to
decision tree methods. Bagging is a special case of the model
averaging approach.
 Implementation Steps of Bagging
• Step 1: Multiple subsets are created from the original data set
with equal tuples, selecting observations with replacement.
• Step 2: A base model is created on each of these subsets.
• Step 3: Each model is learned in parallel with each training set
and independent of each other.
• Step 4: The final predictions are determined by combining the
predictions from all the models.
Boosting
 Boosting is an ensemble modeling technique that attempts to
build a strong classifier from the number of weak classifiers. It is
done by building a model by using weak models in series. Firstly, a
model is built from the training data. Then the second model is
built which tries to correct the errors present in the first model.
This procedure is continued and models are added until either the
complete training data set is predicted correctly or the maximum
number of models is added.
Cont…
 Algorithm:
1. Initialize the dataset and assign equal weight to each of the data point.
2. Provide this as input to the model and identify the wrongly classified data points.
3. Increase the weight of the wrongly classified data points and decrease the weights
of correctly classified data points. And then normalize the weights of all data points.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End
Differences Between Bagging and
No
Boosting
. Bagging Boosting
The simplest way of combining
A way of combining predictions that
1. predictions that
belong to the different types.
belong to the same type.
2. Aim to decrease variance, not bias. Aim to decrease bias, not variance.
Models are weighted according to their
3. Each model receives equal weight.
performance.
New models are influenced
4. Each model is built independently. by the performance of previously built
models.
Different training data subsets are
selected using row sampling with Every new subset contains the elements
5. replacement and random sampling that were misclassified by previous
methods from the entire training models.
dataset.
Bagging tries to solve the over-fitting
6. Boosting tries to reduce bias.
problem.
If the classifier is unstable (high If the classifier is stable and simple (high
Thank you…

Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
Unit 1 Part 4
No ratings yet
Unit 1 Part 4
8 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Live Classroom 2
No ratings yet
Live Classroom 2
40 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
Statistics For Data Science
100% (3)
Statistics For Data Science
39 pages
Unit 2
No ratings yet
Unit 2
19 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
14 pages
AI For Eng Supervised-Learning
No ratings yet
AI For Eng Supervised-Learning
25 pages
Unit 1: Capstone Project
No ratings yet
Unit 1: Capstone Project
21 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Predictive Analytics for Students
No ratings yet
Predictive Analytics for Students
29 pages
Types of ML
No ratings yet
Types of ML
4 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
AI Data Modeling Workflow Guide
No ratings yet
AI Data Modeling Workflow Guide
1 page
Machine Learning Process Overview
No ratings yet
Machine Learning Process Overview
41 pages
ML Model Evaluation Guide
No ratings yet
ML Model Evaluation Guide
15 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Predictive Analytics Overview and Guide
No ratings yet
Predictive Analytics Overview and Guide
8 pages
Capture D'écran . 2025-02-13 À 17.59.41
No ratings yet
Capture D'écran . 2025-02-13 À 17.59.41
69 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
49 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Introduction To Predictive Analytics: UNIT-1
No ratings yet
Introduction To Predictive Analytics: UNIT-1
14 pages
ML Notes
No ratings yet
ML Notes
16 pages
3 - InnovatiCS - Introduction To CRISP-DM
No ratings yet
3 - InnovatiCS - Introduction To CRISP-DM
35 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Group 11 Data Analytics
No ratings yet
Group 11 Data Analytics
8 pages
Presentation On Supervised Learning
No ratings yet
Presentation On Supervised Learning
8 pages
Machine Learning Intro & Evaluation Metrics
No ratings yet
Machine Learning Intro & Evaluation Metrics
50 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Key Concepts in Machine Learning
No ratings yet
Key Concepts in Machine Learning
4 pages
Ch01 ICS422 01
No ratings yet
Ch01 ICS422 01
42 pages
Report Machine Learning 101 1 1
No ratings yet
Report Machine Learning 101 1 1
1 page
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
37 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Advanced HR Analytics and AI Insights
No ratings yet
Advanced HR Analytics and AI Insights
53 pages
Machine Learning Basics: Supervised & Unsupervised
No ratings yet
Machine Learning Basics: Supervised & Unsupervised
23 pages
Essentials of Machine Learning Algorithms
No ratings yet
Essentials of Machine Learning Algorithms
15 pages
A Practical and Technical Introduction To Machine Learning
No ratings yet
A Practical and Technical Introduction To Machine Learning
23 pages
BigData QB (C.format)
No ratings yet
BigData QB (C.format)
6 pages
Churn Prediction with ML Techniques
No ratings yet
Churn Prediction with ML Techniques
77 pages
Ijcrt 195700
No ratings yet
Ijcrt 195700
7 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
Machine Learning in PySpark
No ratings yet
Machine Learning in PySpark
18 pages
Predictive Analytics Steps
No ratings yet
Predictive Analytics Steps
13 pages
PSCS511 - Machine Learning
No ratings yet
PSCS511 - Machine Learning
23 pages
Unit 5
No ratings yet
Unit 5
11 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Machine Learning Path
No ratings yet
Machine Learning Path
21 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Machine Learning: Artificial Intelligence
No ratings yet
Machine Learning: Artificial Intelligence
8 pages
Activity Definition and Sequencing Worksheet 1.2
No ratings yet
Activity Definition and Sequencing Worksheet 1.2
3 pages
Book 2
No ratings yet
Book 2
164 pages
HBL632RT2: Construction Electrical Optics Specification Features
No ratings yet
HBL632RT2: Construction Electrical Optics Specification Features
2 pages
Analysis of Natural Frequencies For Cant
No ratings yet
Analysis of Natural Frequencies For Cant
8 pages
eSRS Guide
No ratings yet
eSRS Guide
3 pages
Size Structured Population Models of Dap
No ratings yet
Size Structured Population Models of Dap
8 pages
2025 Exams Time Table Updated First Draft - 035601
No ratings yet
2025 Exams Time Table Updated First Draft - 035601
3 pages
Cotton Board Over Estimated Production
No ratings yet
Cotton Board Over Estimated Production
2 pages
Color Pals Privacy Policy
No ratings yet
Color Pals Privacy Policy
7 pages
(Metasol MMS) EN C01822-03-202312
No ratings yet
(Metasol MMS) EN C01822-03-202312
56 pages
1967 F1 Mod for GPL Enthusiasts
No ratings yet
1967 F1 Mod for GPL Enthusiasts
2 pages
XCBCV
No ratings yet
XCBCV
17 pages
Cleaning Machines
No ratings yet
Cleaning Machines
13 pages
Chapter 1 Social Responsibility Framework
No ratings yet
Chapter 1 Social Responsibility Framework
53 pages
Statistical Abstract of Andhra Pradesh 2019
100% (1)
Statistical Abstract of Andhra Pradesh 2019
723 pages
Audacity
No ratings yet
Audacity
6 pages
MCC PANEL Scope of Work
100% (1)
MCC PANEL Scope of Work
39 pages
5th Sem BCA 2024
No ratings yet
5th Sem BCA 2024
15 pages
Galvanized Angle Flange Assembly Guide
No ratings yet
Galvanized Angle Flange Assembly Guide
14 pages
Constitution-Trade, Commerce and Intercourse
No ratings yet
Constitution-Trade, Commerce and Intercourse
12 pages
GL Huyett Retaining Rings Catalog
100% (1)
GL Huyett Retaining Rings Catalog
240 pages
Afl GB
No ratings yet
Afl GB
8 pages
Altivar Braking Units Installation Guide
No ratings yet
Altivar Braking Units Installation Guide
42 pages
F Maintenance Program Planning Services
100% (1)
F Maintenance Program Planning Services
272 pages
Exclusive Model Management Contract
No ratings yet
Exclusive Model Management Contract
9 pages
Lesson-3 State Nation Globalization
No ratings yet
Lesson-3 State Nation Globalization
21 pages
Safety Data Sheet SDS: Product: Polyester Resin July 1, 2019
No ratings yet
Safety Data Sheet SDS: Product: Polyester Resin July 1, 2019
7 pages
Liquidity Management and Profitability: A Case Study of Listed Manufacturing Companies in Sri Lanka
100% (1)
Liquidity Management and Profitability: A Case Study of Listed Manufacturing Companies in Sri Lanka
5 pages
Chapter 01 Class 5
No ratings yet
Chapter 01 Class 5
3 pages
Coloured Asphalt Pavements Mix Design and Laboratory Performace Testings
No ratings yet
Coloured Asphalt Pavements Mix Design and Laboratory Performace Testings
12 pages