0% found this document useful (0 votes)
17 views1 page

Assignment3 Practice Classification

Uploaded by

nawaljaveria07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views1 page

Assignment3 Practice Classification

Uploaded by

nawaljaveria07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Assignment 3 Data Mining

Submission not required. Practice these questions, it will help in you the exam.
1. Suppose the proportion of class 1, 2 and 3 in dataset D1 is 0.4, 0.5, and 0.1, respectively and
that in dataset D2 is 0.7, 0.2, 0.1, respectively. Compute the entropy of these sets. Which dataset
is less impure?
2. Suppose we have a dataset D having 32 instances and 4 classes. What are the minimum and
maximum possible values for Info(D), i.e. uncertainty/entropy? Explain. Can you generalize
the range of entropy using this example? i.e., what will be the range of entropy if we have “n”
classes in the data set.
3. Given the data in Table 1, induce a decision tree classifier using ID3 algorithm. Performance
is the class attribute. Show the computation steps.

Table 1
Hostel Regularity Punctuality Smoker Performance
Yes Low Low Yes poor
No Low Medium No poor
Yes Medium High No medium
Yes Medium Low Yes poor
No High Medium Yes poor
No High High No excellent
No Medium High No excellent
No Medium High No medium

4. What is the accuracy of your decision tree classifier on the training data? Make a confusion
matrix.
5. Use your decision tree to predict the labels of following test examples.

yes,low,low,yes,?
yes,high,low,no,?
no,medium,high,no,?
no,medium,medium,no,?

6. Load your data in Table 1 into Weka and train a decision tree (ID3 or J48 classifier). (Prepare
separate arff files for train and test sets).
a. Report the accuracy on the training data including the output predictions by the
classifier.
b. Supply the test set in question 5 to your classifier, and generate output labels. Show
obtained predicted values.

7. Load the Iris dataset in Weka and compare performance of different classifiers on this dataset.
Try to find out the best classifier for this dataset. Report your results in a tabular form. Show
results for at least 3 classifiers and 4 test settings (training data as test, 5-fold cross validation,
10-fold cross validation, 70:30 train test split)
8. Reduce dimensionality of the Iris dataset and compare classification results on original and
reduced dataset. Do you observe any difference in classification accuracy?
9. What is the relationship between decision tree classifier and random forest classifier?
10. Explore SMOTE, a data imbalance handling method.

You might also like