Assignment #2 Introduction To Classification

This document contains an assignment for a data mining course. The assignment includes 4 problems related to classification techniques like naive Bayes classification, k-nearest neighbors (KNN) classification, and evaluating classification models. Specifically, it asks students to: 1) Build a naive Bayes classifier and make predictions on new data using the classifier. 2) Perform KNN classification using 1-nearest neighbor and 3-nearest neighbors on 2D data points. 3) Make a gender prediction for a customer using 3-nearest neighbors classification. 4) Find the k-nearest neighbors for different records in a sample dataset using KNN with Euclidean and Minkowski distances.

Uploaded by

Rania Saoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views4 pages

Assignment #2 Introduction To Classification

Uploaded by

Rania Saoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Assignment #2

Introduction to classification (part 1)

Course Title: Data Mining

Instructor: Dr. Amor Messaoud

Questions

1. Why is naïve Bayesian classification called “naïve”? Briefly outline the major ideas of
naïve Bayesian classification?

Questions

Use the three-class confusion matrix below to answer questions 1 through 3.

1. What percent of the instances were correctly classified?

2. How many class 2 instances are in the dataset?
3. How many instances were incorrectly classified with class 3?
4. Sometimes a data set is partitioned such that a validation set is provided. What is the
purpose of the validation set?
5. If we build a classifier and evaluate it on the training set and the test set:
a. Which data set would we expect to have the higher accuracy: training set or test
set
b. Which data set provides best accuracy estimate on new data: training set test set
6. Consider the one-dimensional data shown in the following table. Classify the data
point x = 5.0 according to its 1-, 3-, and 5-nearest neighbors (using majority vote)

ASSIGNMENT #1 (FEBRUARY 2019) 1

Problem #1

Consider the following dataset of a credit card promotion database. The credit card
company has authorized a new life insurance promotion similar to the existing one. We are
interested in building a classification data mining model for deciding whether to send the
customer promotional material.

1. Build a Naive Bayes classifier for this dataset, by filling in the following with counts
and probabilities.
Life insurance promotion
Y N
Magazine promotion Y
N

Life insurance promotion

Y N
Watch promotion Y
N

Life insurance promotion

Y N
Credit card insurance Y
N

ASSIGNMENT #1 (FEBRUARY 2019) 2

Life insurance promotion
Y N
Sex M
F

2. Use the Naive Bayes classifier obtained in question 1. To determine the value of Life
Insurance Promotion for the following instance:
Magazine Promotion = Y ; Watch Promotion = Y ; Credit Card Insurance = N; Sex =
F; Life Insurance Promotion = ?

Problem #2

Consider the set of training examples in the diagram below. A plus indicates a positive
example and a star indicates a negative example. Use the Euclidian distance to answer the
following questions:
1. How will the point (8, 1) be classified by the 1-nearest neighbor classifier?
2. How will the point (8, 8) be classified by the 3-nearest neighbors?

ASSIGNMENT #1 (FEBRUARY 2019) 3

Problem #3

Lisa has lost gender information of one of her customers, and does not know whether to
make a skirt or trousers. She is planning to throw a coin. Can you help her to make a better
decision using a KNN-classifier (K =3)? Use the Euclidian distance. The customer who is
missing gender information:

Gender Waist Hip

? 28 34
Male 28 32
Male 33 35
Female 27 33
Female 31 36

Problem #4 (Larose and Larose, 2015, p. 312)

The following table contains a small data set of 10 records excerpted from the ClassifyRisk
data set, with predictors’ age, marital status, and income, and target variable risk.

1. Using R find the k-nearest neighbor for Record #10, using k=3.
2. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Euclidean distance.
3. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Minkowski distance.

ASSIGNMENT #1 (FEBRUARY 2019) 4

Comparison of Classification Algorithms
No ratings yet
Comparison of Classification Algorithms
11 pages
Classification: K N X X X y I y
No ratings yet
Classification: K N X X X y I y
6 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
K2 Data Science Bootcamp: Classification Insights
No ratings yet
K2 Data Science Bootcamp: Classification Insights
11 pages
IIT Kharagpur Machine Learning Exam Guide
No ratings yet
IIT Kharagpur Machine Learning Exam Guide
11 pages
2 - Classification Models
No ratings yet
2 - Classification Models
52 pages
Nearest Neighbor & Naive Bayes Classifiers
No ratings yet
Nearest Neighbor & Naive Bayes Classifiers
119 pages
Machine Learning PYQ 2021
No ratings yet
Machine Learning PYQ 2021
4 pages
Python Related
No ratings yet
Python Related
8 pages
Mid-Sem 11
No ratings yet
Mid-Sem 11
2 pages
hw2 2011spring
0% (1)
hw2 2011spring
3 pages
Lecture W11c
No ratings yet
Lecture W11c
13 pages
PMR3508 Problem Set by Fabio Cozman
No ratings yet
PMR3508 Problem Set by Fabio Cozman
6 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
AI Classification Exercises
100% (2)
AI Classification Exercises
13 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
Lecture2 Classification PartI
No ratings yet
Lecture2 Classification PartI
100 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Unit 1
No ratings yet
Unit 1
92 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Baes Theory
No ratings yet
Baes Theory
76 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Generalization in CS168 Algorithms
No ratings yet
Generalization in CS168 Algorithms
16 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
Supervised Classification 3601
No ratings yet
Supervised Classification 3601
39 pages
L6 - SLM Notes (Bayes Algorithm)
No ratings yet
L6 - SLM Notes (Bayes Algorithm)
28 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
WINSEM2023-24 MCSE602L TH VL2023240501960 2024-03-13 Reference-Material-I
No ratings yet
WINSEM2023-24 MCSE602L TH VL2023240501960 2024-03-13 Reference-Material-I
132 pages
Exercises ML PDF
No ratings yet
Exercises ML PDF
4 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
Unit 3 LOGISTIC
No ratings yet
Unit 3 LOGISTIC
7 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
73 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
50 pages
Supervised Learning Techniques Overview
No ratings yet
Supervised Learning Techniques Overview
71 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
Bayesian Networks and Inference Methods
No ratings yet
Bayesian Networks and Inference Methods
23 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
25 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Quantitative Methods Module 1
No ratings yet
Quantitative Methods Module 1
24 pages
AI & ML Unit 4, 5 Notes
No ratings yet
AI & ML Unit 4, 5 Notes
137 pages
CSC 323-07 Bayesian Learning
No ratings yet
CSC 323-07 Bayesian Learning
11 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Introduction To Machine Learning Week1 Explanation
No ratings yet
Introduction To Machine Learning Week1 Explanation
11 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Machine Learning Problem Set Day 5
No ratings yet
Machine Learning Problem Set Day 5
1 page
Machine Learning PYQ 2023
No ratings yet
Machine Learning PYQ 2023
8 pages
ML Questions
No ratings yet
ML Questions
9 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Exercise 5
No ratings yet
Exercise 5
8 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Ict515 Lec1
No ratings yet
Ict515 Lec1
70 pages
Aiml Unit-4
No ratings yet
Aiml Unit-4
82 pages
Purposive Communication
No ratings yet
Purposive Communication
8 pages
Simulink Guide for ChE Students
No ratings yet
Simulink Guide for ChE Students
6 pages
ER Model for University Database Design
No ratings yet
ER Model for University Database Design
2 pages
Call for Papers: IJCACS Journal
No ratings yet
Call for Papers: IJCACS Journal
1 page
Data Science Intern Role at NeenOpal
No ratings yet
Data Science Intern Role at NeenOpal
2 pages
PID Tuning for Electronics Students
No ratings yet
PID Tuning for Electronics Students
16 pages
Quantum Computer
No ratings yet
Quantum Computer
15 pages
Text Mining in Big Data Analytics
No ratings yet
Text Mining in Big Data Analytics
34 pages
Scaling BERT with Recurrent Memory for 2M Tokens
No ratings yet
Scaling BERT with Recurrent Memory for 2M Tokens
9 pages
Master of Science in Business Analytics STEM Texas AM University Commerce
No ratings yet
Master of Science in Business Analytics STEM Texas AM University Commerce
2 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
13 pages
Body, Language and Mind Vol.1 - Embodiment
100% (3)
Body, Language and Mind Vol.1 - Embodiment
474 pages
Algorithm Summary
No ratings yet
Algorithm Summary
5 pages
Artificial Intelligence Oral Question Answer
100% (4)
Artificial Intelligence Oral Question Answer
18 pages
Cs3353 Fds Unit 1 Notes Eduengg
No ratings yet
Cs3353 Fds Unit 1 Notes Eduengg
51 pages
Adba Pyq
No ratings yet
Adba Pyq
3 pages
Abhijit Balaji PDF
No ratings yet
Abhijit Balaji PDF
1 page
Rga PDF
No ratings yet
Rga PDF
84 pages
CNNs: Understanding Convolution
No ratings yet
CNNs: Understanding Convolution
2 pages
Models and Processes of Communication PDF
100% (2)
Models and Processes of Communication PDF
25 pages
AI & Data Science Job Seeker Profile
No ratings yet
AI & Data Science Job Seeker Profile
3 pages
Image Caption Generator
100% (1)
Image Caption Generator
20 pages
Context, Register, Genre
No ratings yet
Context, Register, Genre
4 pages
House Price Prediction
No ratings yet
House Price Prediction
12 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
44 pages
GAN Tutorial for Python Developers
No ratings yet
GAN Tutorial for Python Developers
5 pages
Ann Book
No ratings yet
Ann Book
16 pages
Machine Learning
No ratings yet
Machine Learning
46 pages
Deep Learning
100% (2)
Deep Learning
21 pages
Impact of Machine Learning and Artificial Intelligence On Mankind
No ratings yet
Impact of Machine Learning and Artificial Intelligence On Mankind
8 pages