0% found this document useful (0 votes)

21 views67 pages

Unit Iii

Uploaded by

mshraghvin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views67 pages

Unit Iii

Uploaded by

mshraghvin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 67

UNIT-III

Mapping problems to machine learning tasks

•As a data scientist, your task is to map a business problem to a

good machine learning method.
•Let’s look at a real-world situation. Suppose that you’re a data
scientist at an online retail company.
•There are many business problems that you may need to address.
Predicting customers
Identifying fraud transactions
Determining Price
Grouping customers with similar purchasing behaviour
Marketing campaigns
Seminar on
•Supervised Learning
•Unsupervised Learning
•Reinforcement Learning
For the purpose we will group the different kinds
of problems that a data scientist typically solves
into these categories:

Classification—Assigning labels to data

Scoring—Assigning numerical values to
data
Grouping—Discovering commonalities in
data
Classification problems
•Product categorization based on product attributes and/or text descriptions of the product is an
example of classification.
•Suppose your task is to automate the assignment of new products to your company’s product
categories,
Scoring problems
Ex:1
Predicting the increase in sales from a particular marketing
campaign based on factors is an example of scoring

the communication The traffic source

channel (ads on (Facebook, Google,
websites, YouTube radio stations, and so
videos, print media, on);
email, and so on
Example : 2
Supervised
UnSupervised
Grouping
working without known targets

EX: Mobile + Cover+ Tempered glass

Problem-to-method mapping
•You can’t directly use or fit the model
on a set of training data and say ‘Yes, this
will work.’
•To ensure that the model is correctly
trained on the data provided without
much noise, you need to use cross-
validation techniques.
Evaluating Models
•When building a model, you must be able to estimate model quality in order
to ensure that your model will perform well in the real world.
•Now one of the things that help you to identify whether your model is
performing well or not is overfitting
• An undesirable behaviour that occurs when the model gives accurate
predictions for training data but not for new data is overfitting.
•A model’s prediction error on the data that it trained from is called training
error i.e the average loss that occurred during training process
• A model’s prediction error on new data is called generalization error i.e
how accurately your algorithm is going to predict the values.
•In order to evaluate a model’s performance/In order to detect over fitting we
have two categories.

Hold out Cross

validation K-fold
Method cross
validation
Hold Out method
The hold-out method for training
the machine learning models is a
technique that involves splitting the
data into different sets: one set
for training, and other sets
for validation and testing.
The hold-out method is used to
check how well a machine learning
model will perform on the new data.
Instead of using an entire dataset for
training, different sets called
validation set and test set are
separated or set aside (and, thus,
hold-out name) from the entire
dataset and the model is trained only
on what is termed as the training
dataset.
Validation means tuning of hyperparameters.
In machine learning, a hyperparameter is a
parameter whose value is used to control the
learning process.

Hyper parameters for decision tree:

max dept
max leaf nodes
For Random forest:
min sample split
max features
Evaluating models

• To decide if a given score is high or low, we generally

compare our model’s performance to a few baseline models.
• THE NULL MODEL
• SINGLE-VARIABLE MODELS
• GENERAL MODEL /MULTIPLE VARIABLE MODEL
The null Model
• The most typical null model is a model that returns the same
answer for all situations .
• We use null models as a lower bound on desired performance.
• For example, in a categorical problem, the null model would
always return the most popular category, as this is the easy
guess that is least often wrong.
• For a score model, the null model is often the average of all
the outcomes, as this has the least square deviation from all
the outcomes.
• The idea is that if you’re not outperforming the null model,
you’re not delivering value.
SINGLE-VARIABLE MODELS

• Single-variable models are simply models built using only one

variable at a time.
• Single variables can be categorial or numerical.
• A single-variable model based on categorical features is easiest
to describe as a table. Business analysts use a pivot table (which
promotes values or levels of a feature to be families of new
columns) and statisticians use what’s called a contingency
table (where each possibility is given a column name).
• There are a number of ways to use a numeric feature to
make predictions. A common method is to bin the numeric
feature into a number of ranges and then use the range
labels as a new categorical variable.
MULTIPLE VARIABLE MODELS

• Models that combine the effects of many

variables tend to be much more powerful than
models that use only a single variable.
• Variable selection :A key part of building
many variable models is selecting what
variables to use and how the variables are to
be transformed or treated
For example:
Positive linear relationship: In most cases, universally, the
income of a person increases as his/her age increases.
Negative linear relationship: If the vehicle increases its
speed, the time taken to travel decreases, and vice versa.
N=7 N=7
Substitute the values

So it’s a positive
high correlation
cor(cars,method = "pearson")

cor(cars,method ="spearman")
P-VALUE
The p-value is a measure of the evidence against a null hypothesis.
---The Pr(>|z|) column represents the p-value associated with the value in the z
value column.
---If the p-value is less than a certain significance level (e.g. α = .05) then this
indicates that the predictor variable has a statistically significant relationship with
the response variable in the model.
---In simple…..Whatever the selected attributes that you have…. how
effective and how helpful they are in fitting your model.
• The p-value (also called the significance) is one of
the most important diagnostic columns in the
coefficient summary.

• The p-value estimates the probability of seeing a

coefficient with a magnitude as large as you
observed if the true coefficient is really zero (if the
variable has no effect on the outcome).

• So don’t trust the estimate of any coefficient with a

large p-value.
PERFORMANCE METRICS

• Confusion Matrix
• Accuracy
• Precision
• Recall
• F1 Score
Confusion Matrix
The confusion matrix is a table counting how often each combination of
known outcomes (the truth) occurred in combination with each prediction
type.
Accuracy

• For a classifier, accuracy is defined as the number of items

categorized correctly divided by the total number of items.
• Out of 100 predictions, our model predicted the correct element 73 times.
Therefore, the accuracy of our model is 73%; 73 correct predictions out of 100
predictions made.
• How many predictions are actually positive out of all the total positive predicted.
• It tells how often model predictions match the actual labels
of the data
• For example, let’s say we have a machine that classifies if a
fruit is an apple or not. In a sample of hundreds of apples and
oranges, the accuracy of the machine will be how many apples
it classified correctly as apples and how many oranges it
classified as not apples divided by the total number of apples
and oranges.
Accuracy: (TP + TN) / (TP + FP + TN + FN)
Precision
• Precision is defined as the ratio of true positives to predicted
positives.

• Ex: Measure of patients that we correctly identify as

having a heart disease out of all the patients actually having it.

• Out of all the heart patients how many are identified correctly
• Out of all positive predictions how many are correctly
predicted

• precision: TP/(TP + FP)

OR
• TP/predicted positives
Recall
• Recall is the ratio of true positives over all
actual positives,
• Recall=TP/(TP + FN)
or
• TP/all positives
F1Score
• F1 score is a machine learning evaluation metric that measures
a model’s accuracy. It combines the precision and recall scores
of a model.
• Precision measures how many of the “positive” predictions
made by the model were correct.
• Recall measures how many of the positive class samples
present in the dataset were correctly identified by the model.
• The F1 score combines precision and recall using their
harmonic mean, and maximizing the F1 score implies
simultaneously maximizing both precision and recall.
Evaluating Classification Model
library('caTools') #for splitting (sample.split)
cars<-mtcars
summary(mtcars)
#Min – Minimum value in the given data
#1st Quartile – first quartile in the data
#Median – Median of the data
#Mean – Mean of the data
#3rd Quartile – third quartile in the data
#Max – Maximum value in the given data
cor(cars,method = "pearson“)
#split dataset ratio
split <- sample.split(mtcars, SplitRatio = 0.5)

#split it as training set by making 'split' as

TRUE so that 50% is taken as training with
function subset()
train_reg <- subset(mtcars, split == "TRUE")
dim(train_reg)

#Then consider remaining data for testing by

making split as FALSE
test_reg <- subset(mtcars, split == "FALSE”)
dim(test_reg)
# Training model--build model to know "am" using drat, mpg and
gear with training set
#glm--Generalized Linear Model----outcome is 0/1 so
family=binomial
logistic_model
logistic_model <- glm(am ~ mpg + drat + gear, data = train_reg,family =
"binomial")

#view model summary

summary(logistic_model)
#Predicting the probability of training and testing
#type= response gives the predicted probability
train_reg$pred <-predict(logistic_model,newdata=train_reg,
type = 'response')
train_reg$pred

test_reg$pred <- predict(logistic_model,newdata=test_reg,

type = "response")
test_reg$pred
confmat <- table(truth = test_reg$am ,
prediction = ifelse(test_reg$pred > 0.5,"manual", "auto"))

print(confmat)

#accuracy
(confmat[1,1] + confmat [2,2]) / sum(confmat)

precision <- confmat [1,1] / (confmat [1,1]+ confmat [2,1])

print(precision)

recall <- confmat [1,1]/(confmat [1,1]+ confmat [1,2])

print(recall)

F1 <- (2 * precision * recall / (precision + recall) ) # (precision +

recall) )print(F1)
Evaluating Scoring Model
fit a model that predicts temperature (in Fahrenheit) from the chirp rate (chirps/sec)

crickets <- read.csv("CricketChirps1.csv")

#The lm() function creates a linear regression model in R.

#lm( fitting_formula, dataframe )
#dataframe: determines the name of the data frame that contains
the data.
#This function takes an R formula Y ~ X where Y is the outcome
variable and X is the predictor variable.

cricket_model <- lm(temperatureF ~ chirp_rate, data=crickets)

crickets$temp_pred <- predict(cricket_model, newdata=crickets)

crickets$temp_pred
#RMSE is the square root of the mean of the square of all of the
error.

error_sq <- (crickets$temp_pred - crickets$temperatureF)^2

RMSE <- sqrt(mean(error_sq))

#Formula for R-SQUARED

#R^2 = 1 - RSS/TSS
#R^2 = coefficient of determination
#RSS = sum of squares of residuals
#TSS = total sum of squares

error_sq <- (crickets$temp_pred - crickets$temperatureF)^2

numerator <- sum(error_sq)
delta_sq <- (mean(crickets$temperatureF) –(crickets$temperatureF)^2
denominator = sum(delta_sq)
(R2 <- 1 - numerator/denominator)
#plot(cricket_model)

library('lattice')
xyplot(temperatureF ~ chirp_rate, data=crickets)

#If you want design a regression line along with your scatterplot,
use the argument type
#points (“p”) and a regression line (“r”).

xyplot(temperatureF ~ chirp_rate, data=crickets, type=c("p","r"))

The differences between the predictions of temperatureF and temp_pred are called
the residuals or error of the model on the data. We will use the residuals to calculate
some common performance metrics for scoring models.
Evaluating Probability Model

• Probability models are useful for both

classification and scoring tasks.

•Probability models are models that both decide if

an item is in a given class and return an estimated
probability (or confidence) of the item being in
the class.
ROC(Receiver Operating Characteristic Curve
&
AUC(Area Under Curve)
Terms used in AUC and ROC
Curve.
Sensitivity / True Positive Rate

Specificity / True Negative Rate

False Positive Rate
Higher the AUC, better the model's
performance
library('WVPlots')

ROCPlot(test_reg,xvar = 'pred',truthVar = 'am',truthTarget = TRUE ,title =

'mtcars')
LOG LIKELIHOOD
An important evaluation of an estimated probability is
the log likelihood.

The log likelihood is the logarithm of the product of the

probability

For a spam email with an estimated likelihood of 0.9 of

being spam, the log likelihood is log(0.9)

For a non-spam email, the same score of 0.9 is a log

likelihood of log(1-0.9) (or just the log of 0.1, which was
the estimated probability of not being spam).
The closer to 0 the log likelihood is, the better the prediction.

For spam  Prob. Of being spam is 0.98

For not-spam  Prob. Of being not-spam is (1-0.98)

Probability is used to calculate the chance of an event happening before
it occurs.
Likelihood is used to evaluate how well observed data fits a particular
hypothesis or explanation. Likelihood, on the other hand, is about
assessing the probability of an event after it has already occurred, based
on the evidence or data you have.

Example:
Imagine you are a farmer, and you want to determine the likelihood of
rain tomorrow to decide whether or not to water your crops.

Probability: You found that it rained on 30 out of the last 100 days, then
the probability of rain tomorrow can be estimated as 30% (30 days with
rain out of 100 days total).

Likelihood: You woke up in the morning and noticed dark clouds in the
sky, a drop in temperature, and strong winds, you might say that the
likelihood of rain tomorrow is high.
Another Example for Probability and likelihood
Imagine you and your friend, let's call him John, are going on a picnic, and
you have a weather app that predicts the chance of rain.

Probability: Probability is about predicting the likelihood of an event

before it happens. For example, your weather app says, "There is a 30%
chance of rain today." This means that out of 100 days with similar
weather conditions as today, it's likely to rain on approximately 30 of those
days. Probability deals with predictions or forecasts of future events.

Likelihood: Likelihood, on the other hand, is about assessing the

probability of an event after it has already occurred, based on the evidence
or data you have. So, let's say you and John went on the picnic, and when
you came back home, you noticed that it did rain. Now, you might say, "It
looks like the weather app's prediction was correct; the likelihood of rain
was high today." The likelihood is a measure of how well the observed
data fits a particular probability prediction.
AIC
Akaike information criterion (AIC) is a way to measure the quality of a
statistical model in a simple and understandable manner. It helps us choose the
best model among several competing models

When comparing models (on the same test set), you will generally prefer the
model with the smaller AIC. The AIC is useful for comparing models with
different measures of complexity and modeling variables with differing
numbers of levels.

We want models that fit the data well but aren't too complicated, as overly
complex models can lead to overfitting

Example:
Buying a new smartphone for your grandpa. Imagine you're comparing three
different smartphones (models) based on their features and performance (fitting
the data). The goal is to find the best smartphone that suits your needs.
Simple Model (Model A): This phone has the basic
features you need, such as calling, texting, and high-quality
camera, but it lacks some advanced features like internet
browsing or a large storage capacity.

Intermediate Model (Model B): This phone has more

features, including a better internet browsing and larger
storage capacity, but it's also more expensive than Model
A.

Complex Model (Model C): This phone is a high-end

model with all the latest features, including a top-notch
camera, large storage capacity, virtual reality support, and
more. However, it's the most expensive of the three.
If Model A fits your basic needs and has a lower AIC value,
it would be the preferred choice as it provides a good
balance of utility and affordability.

If Model B has slightly better features and performance, but

the AIC is not significantly lower than Model A, you might
consider whether the extra features are worth the increased
cost.

If Model C has the highest AIC, it may be too complex and

expensive for your needs, and you might decide it's not
worth the extra cost for features you won't use.

Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
AI Evaluation
No ratings yet
AI Evaluation
18 pages
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Unit3 Evaluating Models
No ratings yet
Unit3 Evaluating Models
10 pages
GR 10 - Final Evaluation
No ratings yet
GR 10 - Final Evaluation
45 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
73 pages
Unit 4
No ratings yet
Unit 4
20 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Machine Learning QUESTION AND ANSWERS
No ratings yet
Machine Learning QUESTION AND ANSWERS
13 pages
DS Notes
No ratings yet
DS Notes
36 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
CRISP-DM Methodology for Predictive Analytics
No ratings yet
CRISP-DM Methodology for Predictive Analytics
21 pages
Lecture 13
No ratings yet
Lecture 13
39 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
Model Evaluation
No ratings yet
Model Evaluation
44 pages
Week 4 Lecture Slides BUS265 2023
No ratings yet
Week 4 Lecture Slides BUS265 2023
45 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
How to Evaluate Machine Learning Models
No ratings yet
How to Evaluate Machine Learning Models
14 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
b.3. Evaluating Models
No ratings yet
b.3. Evaluating Models
10 pages
Module 4 Supervised Algoritms-II
No ratings yet
Module 4 Supervised Algoritms-II
40 pages
机器学习
No ratings yet
机器学习
41 pages
Modelling and Evaluation
No ratings yet
Modelling and Evaluation
36 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
37 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
ML Challenges and Metrics
No ratings yet
ML Challenges and Metrics
19 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
Unit 3 ML
No ratings yet
Unit 3 ML
40 pages
Statistics For Data Science
100% (3)
Statistics For Data Science
39 pages
S1 Evaluate Performance LKW 1mar2025
No ratings yet
S1 Evaluate Performance LKW 1mar2025
26 pages
7118 Ds Methodology Ss
No ratings yet
7118 Ds Methodology Ss
56 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Intro to Machine Learning Steps
No ratings yet
Intro to Machine Learning Steps
35 pages
CH 6
No ratings yet
CH 6
24 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Evaluationnai
No ratings yet
Evaluationnai
5 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Routes To Decentralization
No ratings yet
Routes To Decentralization
2 pages
Bda Unit 3
No ratings yet
Bda Unit 3
8 pages
Bda Unit 4
No ratings yet
Bda Unit 4
12 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
37 pages
ML Mid-1 QB - Cse
No ratings yet
ML Mid-1 QB - Cse
3 pages
Studies of Newmark Method For Solving Nonlinear Sy
No ratings yet
Studies of Newmark Method For Solving Nonlinear Sy
13 pages
Thesis Sem 1 Jit
No ratings yet
Thesis Sem 1 Jit
6 pages
LAC Module2 AnswerKey
No ratings yet
LAC Module2 AnswerKey
5 pages
Hybrid Neural Control for TRMS
No ratings yet
Hybrid Neural Control for TRMS
21 pages
Mth501 Mcqs by Hiningtar
No ratings yet
Mth501 Mcqs by Hiningtar
71 pages
NM Practical File
No ratings yet
NM Practical File
12 pages
Modelling
No ratings yet
Modelling
27 pages
ECON 481 Final Exam Analysis
No ratings yet
ECON 481 Final Exam Analysis
4 pages
Advanced Sampling Rate Conversion
No ratings yet
Advanced Sampling Rate Conversion
25 pages
Prediction of Stroke Using Machine Learning
No ratings yet
Prediction of Stroke Using Machine Learning
6 pages
Airline Stewardess Hiring Optimization
No ratings yet
Airline Stewardess Hiring Optimization
5 pages
Unit III Differtial Equation Solution
No ratings yet
Unit III Differtial Equation Solution
17 pages
Self-Quiz Unit 3 - Attempt Review
No ratings yet
Self-Quiz Unit 3 - Attempt Review
12 pages
Scheduling Algorithm
No ratings yet
Scheduling Algorithm
38 pages
Joseph V. Tranquillo - An Introduction To Complex Systems - Making Sense of A Changing World - Springer (2019) PDF
100% (6)
Joseph V. Tranquillo - An Introduction To Complex Systems - Making Sense of A Changing World - Springer (2019) PDF
405 pages
Advanced Nonlinear Programming
No ratings yet
Advanced Nonlinear Programming
33 pages
Advanced Calculus and Probability Notes
No ratings yet
Advanced Calculus and Probability Notes
275 pages
Apple Leaf Disease Report
No ratings yet
Apple Leaf Disease Report
4 pages
CS989B Big Data Fundamentals: Martin.h.goodfellow@strath - Ac.uk
No ratings yet
CS989B Big Data Fundamentals: Martin.h.goodfellow@strath - Ac.uk
12 pages
Quiz1 Keys
No ratings yet
Quiz1 Keys
2 pages
Federated Learning Challenges Methods and Future Directions
No ratings yet
Federated Learning Challenges Methods and Future Directions
11 pages
Convolution in Signal Processing
No ratings yet
Convolution in Signal Processing
6 pages
Data Structres
No ratings yet
Data Structres
24 pages
Unit 4 NSC
No ratings yet
Unit 4 NSC
8 pages
Understanding MAPE for Forecast Accuracy
No ratings yet
Understanding MAPE for Forecast Accuracy
5 pages
ISI Kolkata Placement Prep Guide
No ratings yet
ISI Kolkata Placement Prep Guide
9 pages
ICML - 2017 - ACGAN - Conditional Image Synthesis With Auxiliary Classifier GANs
No ratings yet
ICML - 2017 - ACGAN - Conditional Image Synthesis With Auxiliary Classifier GANs
12 pages
Code2pdf 6400c76826c9d
No ratings yet
Code2pdf 6400c76826c9d
3 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Final Exam
No ratings yet
Final Exam
4 pages

Unit Iii

Uploaded by

Unit Iii

Uploaded by

UNIT-III

Mapping problems to machine learning tasks

•As a data scientist, your task is to map a business problem to a

Classification—Assigning labels to data

the communication The traffic source

EX: Mobile + Cover+ Tempered glass

Hold out Cross

Hyper parameters for decision tree:

• To decide if a given score is high or low, we generally

• Single-variable models are simply models built using only one

• Models that combine the effects of many

• The p-value estimates the probability of seeing a

• So don’t trust the estimate of any coefficient with a

• For a classifier, accuracy is defined as the number of items

• Ex: Measure of patients that we correctly identify as

• precision: TP/(TP + FP)

#split it as training set by making 'split' as

#Then consider remaining data for testing by

#view model summary

test_reg$pred <- predict(logistic_model,newdata=test_reg,

precision <- confmat [1,1] / (confmat [1,1]+ confmat [2,1])

recall <- confmat [1,1]/(confmat [1,1]+ confmat [1,2])

F1 <- (2 * precision * recall / (precision + recall) ) # (precision +

crickets <- read.csv("CricketChirps1.csv")

#The lm() function creates a linear regression model in R.

cricket_model <- lm(temperatureF ~ chirp_rate, data=crickets)

crickets$temp_pred <- predict(cricket_model, newdata=crickets)

error_sq <- (crickets$temp_pred - crickets$temperatureF)^2

#Formula for R-SQUARED

error_sq <- (crickets$temp_pred - crickets$temperatureF)^2

xyplot(temperatureF ~ chirp_rate, data=crickets, type=c("p","r"))

• Probability models are useful for both

•Probability models are models that both decide if

Specificity / True Negative Rate

ROCPlot(test_reg,xvar = 'pred',truthVar = 'am',truthTarget = TRUE ,title =

The log likelihood is the logarithm of the product of the

For a spam email with an estimated likelihood of 0.9 of

For a non-spam email, the same score of 0.9 is a log

For spam  Prob. Of being spam is 0.98

For not-spam  Prob. Of being not-spam is (1-0.98)

Probability: Probability is about predicting the likelihood of an event

Likelihood: Likelihood, on the other hand, is about assessing the

Intermediate Model (Model B): This phone has more

Complex Model (Model C): This phone is a high-end

If Model B has slightly better features and performance, but

If Model C has the highest AIC, it may be too complex and

You might also like