Unit IV Classification DataScience
Unit IV Classification DataScience
1
CLASSIFICATION
•Classification
• Nearest Neighbours
•Rows of Tables
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions.
2
CLASSIFICATION
•Classification
Classification
• Nearest Neighbours
•Rows of Tables
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions.
3
What is Classification in Data Science?
Classification:
•Understanding the data behavior and identifying the resultant groups
4
CLASSIFICATION
• Classification is a fundamental concept in data science and machine learning.
In simple terms, classification involves assigning input data to one of several
predefined categories or classes.
• In other words, a classification model sorts data points into predefined groups
called classes. Think of it like a mail sorter that learns to put each piece of mail
into the right mailbox (spam or inbox, say) based on its features.
5
CLASSIFICATION- A TWO STEP PROCESS
6
Process (1)-Model Construction
(Training Phase
7
Process (2)- Using the model in
prediction (Testing Phase)
8
Types of Classification Problems
9
Types of Classification Problems
Binary Classification: This is the simplest case, where each input is assigned to one of two
classes. For example, predicting whether an email is spam or not spam, or whether a patient has
a disease (yes/no). In binary classification, the data is labeled in a binary way (e.g., 0/1,
true/false, positive/negative).
Multi-Class Classification: Here, there are more than two possible classes, but still exactly one
label per example. For example, an image classifier might label photos as cat, dog, or rabbit. The
model must pick one class out of many.
Multi-Label Classification: In some tasks, each instance can belong to multiple classes
simultaneously. For example, a photo might contain both a “bicycle” and an “apple,” so it has
two labels. In multi-label classification, a model predicts a set of classes for each example. This is
different from multi-class, since examples are not exclusive to one class.
Imbalanced Classification: Many real-world datasets are imbalanced, meaning some classes
have many more examples than others. Examples include fraud detection or rare disease
diagnosis.
10
Common Classification
Algorithms in Data Science
11
Algorithms in Classification in Data Science
Logistic Regression:
Logistic Regression is a classification algorithm used to predict a binary outcome (e.g. yes/no, 0/1,
true/false) based on independent variables. It uses an equation to determine the probability of an
event occurring, and then uses a threshold value to determine the outcome.
12
Algorithms in Classification in Data Science
Decision Tree:
Decision Tree is a supervised machine learning algorithm used for both classification and
regression. It works by constructing a decision tree from the training data, which is then used to
make predictions on unseen data points.
Random Forest:
Random Forest is an ensemble machine-learning algorithm used for both classification and
regression. It works by randomly selecting a subset of features, and then building multiple
decision trees from the dataset.
13
CLASSIFICATION
•Classification
•1Nearest Neighbors
•Rows of Tables
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions.
14
Nearest Neighbors
•"Nearest neighbors" refers to the concept of finding the closest data points to a given point,
often used in machine learning for classification (K-Nearest Neighbors or KNN) and in optimization
problems.
•In KNN, an unknown data point is assigned to a class based on a majority vote of its k nearest
neighbors in a training set, using distance metrics like Euclidean distance to determine closeness.
15
Nearest Neighbors
Example 1
A simple nearest neighbor's example involves classifying a new data point by finding its k closest
neighbors in a dataset and assigning it the majority class of those neighbors. For instance, if you are
classifying a new fruit based on its shape and size, and you choose k=3, the algorithm looks at the three
closest-known fruits. If two of those three neighbors are apples and one is a banana, the new fruit is
classified as an apple.
Example 2
16
Nearest Neighbors
17
Understanding Decision Boundaries in K-
Nearest Neighbours
18
Understanding Decision Boundaries in K-
Nearest Neighbours
• A decision boundary is a line or surface that divides different groups in a
classification task.
• It shows which areas belong to which class based on what the model decides. K-
Nearest Neighbors (KNN) algorithm operates on the principle that similar data
points exist in close proximity within a feature space.
• The shape of this boundary depends on:
•The value of K (how many neighbors are considered).
• For example, given a dataset with two classes the decision boundary can be
visualized as the line or curve dividing the two regions where each class is
predicted.
19
How KNN creates decision boundaries
In KNN, decision boundaries are influenced by the choice of k and the distance metric
used:
1. Impact of 'K' on Decision Boundaries: The number of neighbors (k) affects the
shape and smoothness of the decision boundary.
•Small k: When k is small the decision boundary can become very complex, closely
following the training data. This can lead to overfitting.
•Large k: When k is large the decision boundary smooths out and becomes less sensitive to
individual data points, potentially leading to underfitting.
2. Distance Metric: The decision boundary is also affected by the distance metric used
like Euclidean, Manhattan. Different metrics can lead to different boundary shapes.
•Euclidean Distance: Commonly used leading to circular or elliptical decision
boundaries in two-dimensional space.
20
Decision Boundaries in K-Nearest Neighbours For different ‘k’
21
Factors That Affect KNN Decision
Boundaries
Feature Scaling: KNN is sensitive to the scale of data. Features with
larger ranges can dominate distance calculations, affecting the boundary
shape.
Noise in Data: Outliers and noisy data points can shift or distort decision
boundaries, leading to incorrect classifications.
Data Distribution: How data points are spread across the feature space
influences how KNN separates classes.
22
How Does the K-Nearest
Neighbors Algorithm Works?
• The K-NN algorithm compares a new data entry to the values in a given data set (with
different classes or categories).
• Based on its closeness or similarities in a given range (K) of neighbors, the algorithm assigns
the new data to a class or category in the data set (training data).
Step #2 - Calculate the distance between the new data entry and all other existing data entries
(you'll learn how to do this shortly). Arrange them in ascending order.
Step #3 - Find the K nearest neighbors to the new entry based on the calculated distances.
Step #4 - Assign the new data entry to the majority class in the nearest neighbors.
23
CLASSIFICATION
•Classification
•Nearest Neighbors
•Training
Training and Testing
and Testing
•Rows of Tables
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions.
24
Training and testing
How good is our nearest neighbor classifier?
• To answer this, we’ll need to find out how frequently our
classifications are correct. If a patient has chronic kidney disease, how
likely is our classifier to pick that up?
25
Training and testing
How to find out whether the prediction is correct?
❖One way is to wait for further medical tests on the patient and then
check whether or not our prediction agrees with the test results.
With that approach, by the time we can say how likely our prediction
is to be accurate, it is no longer useful for helping the patient.
❖Instead, we will try our classifier on some patients whose true classes
are known. Then, we will compute the proportion of the time our
classifier was correct. This proportion will serve as an estimate of the
proportion of all new patients whose class our classifier will
accurately predict. This is called testing.
26
Training and testing
Overly Optimistic “Testing”:
• Overly optimistic testing in nearest neighbor (k-NN) classification
occurs when the same data used to train the model is also used to test
its performance.
• Because the k-NN model simply memorizes the training data, this
practice will produce a misleadingly high accuracy score, often 100%,
that does not reflect how the model will perform on new, unseen
data.
The core reason for this bias
• The k-NN algorithm is an instance-based or "lazy" learner. Instead of
building a generalized model during a training phase, it stores the
entire training dataset.
• For any given point in the training set, its nearest neighbor will always
be itself, at a distance of zero. Therefore, a 1-nearest neighbor
classifier will always correctly classify every point in the training set.
28
Training and testing
How to avoid overly optimistic testing:
To get a more realistic and unbiased evaluation of a k-NN model, you must
measure its performance on a separate, unseen dataset. The standard
practice is to split the original dataset into two or three parts:
•Training set: A portion of the data used to "train" the model (i.e.,
to have it memorize the data points).
•Testing set: A separate portion of the data used to evaluate the
model's performance. The model has never seen this data
before.
•Validation set (optional): A third set used for fine-tuning
hyperparameters, like the value of k.
This process is known as a train-test split and can be extended with methods
like cross-validation, which repeatedly splits the data and averages the
performance scores to produce an even more robust evaluation.
29
CLASSIFICATION
•Classification
•Nearest Neighbors
Rows of
•Rows of Tables
Tables
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions.
30
rows of tables
• In nearest neighbor classification, rows of a table represent the
individual data points or observations in the training dataset.
• Each row contains all the information about the feature values
and the known class label for a single instance.
How rows function in nearest neighbor classification
1.A row is a single data point: Each row is a complete set of attributes for one
observation. For example, in a medical diagnosis problem, a row might represent one
patient and include their age, blood pressure, and other lab results.
2.Rows are compared: When you want to classify a new, unlabeled data point, the algorithm
compares the features of this new point to the feature values in every row of your training
table. The comparison is done using a distance metric, like Euclidean distance, to determine
how "close" each training row is to the new point.
3.Rows identify neighbors: The distances are then sorted, and the top k rows (the "neighbors") with
the smallest distances are selected.
4.Rows determine the class: The class labels of the k nearest-neighbor rows are examined to make a prediction
for the new data point. For a classification problem, the new point is assigned the class that is most common
among its nearest neighbors (a process known as "majority voting").
31
rows of tables
Example using a "chronic kidney disease" table
Imagine a training table for predicting chronic kidney disease (CKD), with each
row representing a different patient.
Patient IDHemoglobin Glucose White Blood Cell Count
Class
P101 11.2 117 6700 CKD
P102 9.5 70 12100 CKD
P103 12.5 264 9600 Not CKD
P104 10.0 70 18900 CKD
... ... ... ... ...
•Training data: The entire table, including all the rows, is the training data for
the classifier.
•A new patient: A new patient, Alice, comes in with a Hemoglobin level of
10.5, a Glucose level of 120, and a White Blood Cell Count of 8000.
32
rows of tables
The process:
• The algorithm calculates the distance between Alice's data and the
data in each row of the table.
• If k=3, the algorithm finds the three rows in the table that are
"closest" to Alice based on her Hemoglobin, Glucose, and White Blood
Cell Count values.
• It then looks at the "Class" column for those three nearest rows to see
if the majority of them are "CKD" or "Not CKD".
33
CLASSIFICATION
•Classification
•Nearest Neighbors
•Rows of Tables
Implementing the
•Implementing theClassifier
Classifier
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions.
34
Implementing the classifier
• We are now ready to implement a nearest neighbor classifier based on multiple attributes.
• We have used only two attributes so far, for ease of visualization. But usually, predictions will
be based on many attributes.
• Here is an example that shows how multiple attributes can be better than pairs.
This time we’ll look at predicting whether a banknote (e.g., a $20 bill) is counterfeit or legitimate.
Researchers have put together a data set for us, based on photographs of many individual banknotes:
some counterfeit, some legitimate. They computed a few numbers from each image, using techniques
that we won’t worry about for this course. So, for each banknote, we know a few numbers that were
computed from a photograph of it as well as its class (whether it is counterfeit or not). Let’s load it
into a table and take a look.
35
Implementing the classifier
This time we’ll look at predicting whether a banknote (e.g., a $20 bill) is counterfeit or legitimate. Researchers
have put together a data set for us, based on photographs of many individual banknotes: some counterfeit,
some legitimate. They computed a few numbers from each image, using techniques that we won’t worry about
for this course. So, for each banknote, we know a few numbers that were computed from a photograph of it as
well as its class (whether it is counterfeit or not). Let’s load it into a table and take a look.
36
Implementing the classifier
Let’s look at whether the first two numbers tell us anything about whether the banknote is
counterfeit or not from the below scatterplot that considered only the two attributes WaveletCurt
and WaveletVar.
37
Implementing the classifier
Observation: There is some overlap between the blue cluster and the gold cluster.
Inference: This indicates that there will be some images where it’s hard to tell whether the banknote is
legitimate based on just these two numbers. Still, the legitimacy of a banknote could be predicted using a
nearest neighbor classifier.
38
Implementing the classifier
The Scatterplot obtained for different two attributes chosen(Entropy & WaveletSkew)
Observation: here again overlap between blue and gold clusters results in a complex structure by
considering other two attributes .
Inference: Difficult to differentiate between counterfeit and legitimate currency notes.
39
Implementing the classifier
Multiple attributes
• So far , exactly 2 attributes that were used to make our prediction.
40
Implementing the classifier
Try to predict whether a banknote is counterfeit or not using 3 of the measurements, instead of just
2. The Scatterplot is as follows:
• Observation: There is no overlap between the classes
counterfeit and legitimate.
41
Implementing the classifier
How to use -nearest neighbor classification to predict the answer to a yes/no
question, based on the values of some attributes, assuming you have a training set
with examples where the correct prediction is known?
1.identify some attributes that you think might help you predict the
answer to the question.
2.Gather a training set of examples where you know the values of the
attributes as well as the correct prediction.
3.To make predictions in the future, measure the value of the attributes
and then use -nearest neighbor classification to predict the answer to
the question.
42
Implementing the classifier
Distance in Multiple Dimensions
Euclidean Distance for two dimensions:
+……+(N0-N1)2
43
K-Nearest Neighbors Algorithm -
Example
Brightness Saturation Class
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
The table above represents our data set. We have two columns Brightness and Saturation.
Each row in the table has a class of either Red or Blue.
44
K-Nearest Neighbors Algorithm -
Example
Before we introduce a new data entry, let's assume the value of K is 5.
We have a new entry but it doesn't have a class yet. To know its class, we have to
calculate the distance from the new entry to other entries in the data set using
the Euclidean distance formula.
45
K-Nearest Neighbors Algorithm -
Example
Where:
•X₂ = New entry's brightness (20).
•X₁= Existing entry's brightness.
•Y₂ = New entry's saturation (35).
•Y₁ = Existing entry's saturation.
Let's do the calculation together. I'll calculate the first three.
Distance #1
For the first row, d1:
46
K-Nearest Neighbors Algorithm -
Example
We now know the distance from the new data entry to the first entry in the table. Let's
update the table.
47
K-Nearest Neighbors Algorithm -Example
Distance #2
For the second row, d2:
Brightness Saturation Class Distance
50 50 Blue ?
d2 = √(20 - 50)² + (35 - 50)²
= √900 + 225
= √1125
= 33.54
Here's the table with the updated distance:
Brightness Saturation Class Distance
40 20 Red 25
50 50 Blue 33.54
60 90 Blue ?
10 25 Red ?
70 70 Blue ?
60 10 Red ?
25 80 Blue ?
48
K-Nearest Neighbors Algorithm -Example
Distance #3
For the third row, d3:
Brightness Saturation Class Distance
60 90 Blue ?
d2 = √(20 - 60)² + (35 - 90)²
= √1600 + 3025
= √4625
= 68.01 Updated table:
49
K-Nearest Neighbors Algorithm -Example
Here's what the table will look like after all the distances have been calculated:
Updated table:
Brightness Saturation Class Distance
40 20 Red 25
50 50 Blue 33.54
60 90 Blue 68.01
10 25 Red 10
70 70 Blue 61.03
60 10 Red 47.17
25 80 Blue 45
50
K-Nearest Neighbors Algorithm -Example
Let's rearrange the distances in ascending order:
51
K-Nearest Neighbors Algorithm -Example
Since we chose 5 as the value of K, we'll only consider the first five rows. That is:
52
K-Nearest Neighbors Algorithm -Example
As you can see above, the majority class within the 5 nearest neighbors to the new entry
is Red. Therefore, we'll classify the new entry as Red.
Here's the updated table:
53
How to Choose the Value of K in the K-NN
Algorithm
There is no particular way of choosing the value K, but here are some
common conventions to keep in mind:
•Choosing a very low value will most likely lead to inaccurate predictions.
54
K-NN Algorithm
55
CLASSIFICATION
•Classification
•Nearest Neighbors
•Rows of Tables
Performance Measures
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions.
56
accuracy of the classifier
• To see how well our classifier does, we might put 50% of the data
into the training set and the other 50% into the test set.
• Basically, we are setting aside some data for later use, so we can use
it to measure the accuracy of our classifier.
• We’ve been calling that the test set. Sometimes people will call the
data that you set aside for testing a hold-out set, and they’ll call this
strategy for estimating accuracy the hold-out method.
57
accuracy of the classifier
Cancer Diagnosis
• If a patient has a lump in a region, the doctors may want to take a biopsy to see if it is cancerous.
• The doctor gets a sample of the mass, puts it under a microscope, takes a picture, and a trained lab
tech analyzes the picture to determine whether it is cancer or not. We get a picture like one of the
following:
58
accuracy of the classifier
• Unfortunately, distinguishing between benign vs malignant can be tricky. So, researchers have
studied the use of machine learning to help with this task.
• The idea is that we’ll ask the lab tech to analyze the image and compute various attributes:
things like the typical size of a cell, how much variation there is among the cell sizes, and so on.
• Then, we’ll try to use this information to predict (classify) whether the sample is malignant or
not. We have a training set of past samples from patients where the correct diagnosis is known,
and we’ll hope that our machine learning algorithm can use those to learn how to predict the
diagnosis for future samples.
59
Accuracy of the classifier
For improved visibility only two parameters were considered for the scatterplot
Observation:That plot is utterly misleading, because there are a bunch of points that have identical values
for both the x- and y-coordinates.
Action: One key innovation is that incorporation of confidence score into the results aided the algorithm to
produce 99% accurate result
60
CLASSIFICATION
•Classification
•Nearest Neighbors
•Rows of Tables
•Performance Measures
•Updating Predictions
Updating Predictions
•Binary Classifier
• Making Decisions.
61
UPDATING PREDICTIONS
• Classification is just a prediction of the class, based on the most common class among
the training points that are nearest our new point.
• Suppose that we eventually find out the true class of our new point. Then we will
know whether we got the classification right. Also, we will have a new point that we
can add to our training set, because we know its class. This updates our training set.
So, naturally, we will want to update our classifier based on the new training set.
• Let us look at some simple scenarios where new data leads us to update our
predictions. While the examples in the chapter are simple in terms of calculation, the
method of updating can be generalized to work in complex settings and is one of the
most powerful tools used for machine learning.
62
CLASSIFICATION
•Classification
•Nearest Neighbors
•Rows of Tables
•Performance Measures
•Updating Predictions
•Binary Classifier
Binary Classifier
• Making Decisions.
63
A “More Likely Than Not” Binary Classifier
Let’s try to use data to classify a point into one of two categories, choosing the category that we think is
more likely than not. To do this, we not only need the data but also a clear description of how chances are
involved.
Suppose there is a university class with the following composition:
•60% - Second Years
•40% - Third Years
•50% Second Years- have declared their major
•80% Third Years- have declared their major
pick a student at random from the class.
Can he/she be classified as Second Year or Third Year using the “more likely than not” criterion?
Irrespective of the majors, it is easy to predict the year of the student based on the given proportions of
Second and Third Years in the class.
64
A “More Likely Than Not” Binary Classifier
Year Declared Undeclared
Second 30 30
Third 32 8
• The total count is 100 students, of whom 60 are Second Years and 40 are Third Years.
• Among the Second Years, 50% are in each of the Major categories.
• Among the 40 Third Years, 20% are Undeclared and 80% Declared.
• So, this population of 100 students has the same proportions as the class in our problem, and we can
assume that our student has been picked at random from among all 100 students.
65
A “More Likely Than Not” Binary Classifier
Updating the Prediction Based on New Information
Now in addition to the above scenario, the student has declared a major.
• It becomes important to look at the relation between year and major declaration.
• More students are Second Years than Third Years. But it’s also true that among the Third
Years, a much higher percent have declared their major than among the Second Years
66
A “More Likely Than Not” Binary Classifier
Present case:(After adding information that major is declared)
Prediction falls into any of the two categories: second year/third year
There are 62 students in those cells, and 32 out of the 62 are Third Years. That’s
than half, even though not by much.
So, inclusion of new information about the student’s major results in updation of
our prediction and now we classify the student as a Third Year(since majority of students
who declared their major is in third year).
In other words, the chance that we are correct is the proportion of Third Years among the
students who have Declared
.
67
A “More Likely Than Not” Binary Classifier
Tree Diagram
The previous calculation depends only on the proportions in the different categories, not
on the counts. The proportions can be visualized in a tree diagram, shown directly below
the pivot table for ease of comparison.
Tree Diagram
Pivot Table
68
A “More Likely Than Not” Binary Classifier
Note:
• The “Third Year, Declared” branch contains the proportion 0.4 x 0.8 =0.32 of the
students, corresponding to the 32 students in the “Third Year, Declared” cell of the
pivot table.
• The “Second Year, Declared” branch contains 0.6 x 0.5 = 0.3 of the students,
corresponding to the 30 in the “Second Year, Declared” cell of the pivot table.
• We know that the student who was picked belongs to a “Declared” branch; that is, the
student is either in the top branch or the third from top. Those two branches now form
our reduced space of possibilities, and all chances have to be calculated relative to the
total chance of this reduced space.
• So, given that the student is Declared, the chance of them being a Third Year can be
calculated directly from the tree. The answer is the proportion in the “Third Year,
Declared” branch relative to the total proportion in the two “Declared” branches.
69
A “More Likely Than Not” Binary Classifier
Bayes’ Rule
solved what was called an “inverse probability” problem: given new data, how can
you update chances you had found earlier?
widely used now in machine learning.
Terminologies:
Prior probabilities. Before we knew the chosen student’s major declaration status, the
chance that the student was a Second Year was 60% and the chance that the student
was a Third Year was 40%. These are the prior probabilities of the two categories.
Likelihoods. These are the chances of the Major status, given the category of student;
thus they can be read off the tree diagram. For example, the likelihood of Declared
status given that the student is a Second Year is 0.5.
Posterior probabilities. These are the chances of the two Year categories, after we have
taken into account information about the Major declaration status. We computed one
of these:
The posterior probability that the student is a Third Year, given that the student has
Declared,is denoted and is calculated as follows.
70
A “More Likely Than Not” Binary Classifier
The posterior probability that the student is a Third Year, given that the student has
Declared, is denoted and is calculated as follows.
71
A “More Likely Than Not” Binary Classifier
• That’s about 0.484, which is less than half, consistent with our classification of Third Year.
• Notice that both the posterior probabilities have the same denominator: the chance of the new
information, which is that the student has Declared.
72
A “More Likely Than Not” Binary Classifier
73
A “More Likely Than Not” Binary Classifier
How Bayes’ Rule and probability help us make rational decisions when information is
incomplete?
•Bayesian inference allows us to update our beliefs when new evidence arrives.
• Example: An online retailer updates its belief about product demand after new sales
data comes in.
74
CLASSIFICATION
•Classification
•Nearest Neighbors
•Rows of Tables
•Performance Measures
•Updating Predictions
•Binary Classifier
• Making Decisions
Making Decisions
75
Making decisions
• A primary use of Bayes’ Rule is to make decisions based on incomplete information,
incorporating new information as it comes in. Many medical tests for diseases return
Positive or Negative results.
• A Positive result means that according to the test, the patient has the disease. A
Negative result means the test concludes that the patient doesn’t have the disease.
• Medical tests are carefully designed to be very accurate. But few tests are accurate
100% ofthe time. Almost all tests make errors of two kinds:
• A false positive is an error in which the test concludes Positive but the patient
doesn’thave the disease.
• A false negative is an error in which the test concludes Negative but the patient
doeshave the disease.
These errors can affect people’s decisions
• False positives can cause anxiety and unnecessary treatment (which in some cases is
expensive or dangerous).
• False negatives can have even more serious consequences if the patient doesn’t receive
treatment because of their Negative test result.
76
Making decisions
A Test for a Rare Disease
Suppose there is a large population and a disease that strikes a tiny proportion of the
population. The tree diagram below summarizes information about such a disease and about a
medical test for it.
Given:
4 in 1000 people have the disease (P(Disease) = 0.004)
Test accuracy:
P(Positive|Disease) = 0.99 (true positive rate)
P(Positive|No Disease) = 0.05 (false positive rate)
Question: If a person tests positive, what is P(Disease|Positive)?
77
Making decisions
Suppose a person is picked at random from the population and tested. If the test result is
Positive, how would you classify them: Disease, or No disease?
• We can answer this by applying Bayes’ Rule and using our “more likely than not” classifier.
• Given that the person has tested Positive, the chance that he or she has the disease is the
proportion in the top branch, relative to the total proportion in the Test Positive branches
(0.004 * 0.99)/(0.004 * 0.99 + 0.996*0.005)=0.44295302013422816
• Interpretation: Even after a positive test, it’s still more likely (55.7%) that the person does not
have the disease.
78
Making decisions
Explaining the Counterintuitive Result
79
Making decisions
• The cells of the table have the right counts. For example, according to the description
of the population, 4 in 1000 people have the disease.
• There are 100,000 people in the table, so 400 should have the disease. That’s what the
table shows: 4 + 396 = 400. Of these 400, 99% get a Positive test result: 0.99 x 400 =
396.
• Among the Positives, the proportion that have the disease is:
396/(396 + 498)= 0.4429530201342282
• That’s the answer we got by using Bayes’ Rule. The counts in the Positives column show
why it is less than 1/2. Among the Positives, more people don’t have the disease than
do have the disease.
• The reason is that a huge fraction of the population doesn’t have the disease in the first
place. The tiny fraction of those that falsely test Positive are still greater in number than
the people who correctly test Positive.
80
Making decisions
This is easier to visualize in the tree diagram:
• The proportion of true Positives is a large fraction (0.99) of a tiny fraction (0.004) of
the population.
• The proportion of false Positives is a tiny fraction (0.005) of a large fraction (0.996) of
the population.
• These two proportions are comparable; the second is a little larger. So, given that
the randomly chosen person tested positive, we were right to classify them as more
likely than not to not have the disease.
81
Making decisions
This is easier to visualize in the tree diagram:
• The proportion of true Positives is a large fraction (0.99) of a tiny fraction (0.004) of
the population.
• The proportion of false Positives is a tiny fraction (0.005) of a large fraction (0.996) of
the population.
• These two proportions are comparable; the second is a little larger. So, given that
the randomly chosen person tested positive, we were right to classify them as more
likely than not to not have the disease.
82
Making Decisions
A Subjective Prior
Focus: Understanding how subjective beliefs
(priors) affect Bayesian decision-making
outcomes.
83
Making Decisions
A Subjective Prior
When Being Right Feels Wrong
• Our earlier decision classified a Positive patient as “No
Disease.”
Key Idea:
People are not tested at random — they get tested because
they or their doctor suspect illness.
84
Making Decisions
A Subjective Prior
The Problem with Random Sampling Assumption
The previous calculation assumed a randomly chosen person
from the population.
85
Making Decisions
A Subjective Prior
Introducing the Subjective Prior
A subjective prior represents personal or expert opinion rather
than population frequency.
Here: “The doctor thinks there’s a 5% chance the patient has
the disease.”
86
Making Decisions
A Subjective Prior
Here: “Subjective prior = belief-based probability reflecting expert
judgment.
87
Making Decisions
A Subjective Prior
Prior: P(Disease) = 0.05
P(Positive | Disease) = 0.99
P(Positive | No Disease) = 0.005
Apply Bayes’ Rule:
0.05 × 0.99
𝑃 𝐷𝑖𝑠𝑒𝑎𝑠𝑒 ∣ 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = = 0.912
0.05 × 0.99 + 0.95 × 0.005
Interpretation:
The posterior probability of disease is now 91.2%.
A positive test almost certainly means the patient has the disease.
88
Making Decisions
A Subjective Prior
Tree Structure:
Start with 100,000 patients.
5% (5000) believed to have disease.
Apply test accuracy and false positive rates.
89
Making Decisions
Confirming the Answer
Creating an artificial population:
• Though the doctor’s opinion is subjective, we can generate an
artificial population in which 5% of the people have the disease
and are tested using the same test.
90
Making Decisions
Confirming the Answer
• In this artificially created population of 100,000 people, 5000
people (5%) have the disease, and 99% of them test Positive,
leading to 4950 true Positives.
91
MAKING DECISIONS
92
93