0% found this document useful (0 votes)
39 views3 pages

Tutorial 7

This document discusses training a naive Bayes classifier to predict survival on the Titanic using passenger data. It loads and explores the Titanic dataset, calculates probabilities of survival for each variable, makes a prediction for an example passenger, and compares the prediction to one made using the e1071 naiveBayes function.

Uploaded by

Low Jia Hui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views3 pages

Tutorial 7

This document discusses training a naive Bayes classifier to predict survival on the Titanic using passenger data. It loads and explores the Titanic dataset, calculates probabilities of survival for each variable, makes a prediction for an example passenger, and compares the prediction to one made using the e1071 naiveBayes function.

Uploaded by

Low Jia Hui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Tutorial 7

DSA1101
Introduction to Data Science
October 19, 2018

Exercise 1. The Naı̈ve Bayes Classifier


This week, we will look at the CSV dataset “[Link]” which provides
information on the fate of passengers on the fatal maiden voyage of the ocean
liner Titanic, and includes the variables economic status (class), sex, age and
survival. We will train a naı̈ve Bayes classifier using this dataset, and predict
survival.

(a) Load the dataset “[Link]” which has been posted under the folder
for Tutorial 7.

1 Titanic _ dataset = read . csv ( " Titanic . csv " )


2 dim ( Titanic _ dataset )
3 head ( Titanic _ dataset )

(b) Compute the probabilities P (Y = 1) (survived) and P (Y = 0) (did not


survive).

1 tprior <- table ( Titanic _ dataset $ Survived )


2 tprior
3 tprior <- tprior / sum ( tprior )
4 tprior

1
(c) Compute the conditional probabilities P (Xi = xi |Y = 1) and P (Xi =
xi |Y = 0) , where i = 1, 2, 3, 4 for the feature variables X = {class, sex, age}.

1 classCounts <- table ( Titanic _ dataset [ , c ( " Survived " , " Class " ) ])
2 classCounts <- classCounts / rowSums ( classCounts )
3 classCounts
4
5 genderCounts <- table ( Titanic _ dataset [ , c ( " Survived " , " Sex " ) ])
6 genderCounts <- genderCounts / rowSums ( genderCounts )
7 genderCounts
8
9 ageCounts <- table ( Titanic _ dataset [ , c ( " Survived " , " Age " ) ])
10 ageCounts <- ageCounts / rowSums ( ageCounts )
11 ageCounts

(d) Predict survival for an adult female passenger in 2nd class cabin.

1 prob _ survived <-


2 classCounts [ " Yes " ," 2 nd " ] *
3 genderCounts [ " Yes " ," Female " ] *
4 ageCounts [ " Yes " ," Adult " ] *
5 tprior [ " Yes " ]
6
7 prob _ not _ survived <-
8 classCounts [ " No " ," 2 nd " ] *
9 genderCounts [ " No " ," Female " ] *
10 ageCounts [ " No " ," Adult " ] *
11 tprior [ " No " ]
12
13 prob _ survived
14 prob _ not _ survived

2
(e) Compare your prediction in (d) with the one performed by the naiveBayes
function in package ‘e1071’

1 library ( e1071 )
2
3 model <- naiveBayes ( Survived ~ . ,
4 Titanic _ dataset )
5

6 test <- data . frame ( Class = " 2 nd " , Sex = " Female " ,
7 Age = " Adult " )
8
9 results <- predict ( model , test )
10 results
11 results <- predict ( model , test , " raw " )
12 results
13
14 # ratio of probability scores
15 prob _ survived / prob _ not _ survived
16 # ratio of actual probabilities
17 results [1 , " Yes " ] / results [1 , " No " ]

You might also like