0% found this document useful (0 votes)

31 views11 pages

Lab 01-Form-Submission Sample

The document is a tutorial for an introductory data mining class using Weka software, covering topics such as exploring datasets, building classifiers, and visualizing data. It includes specific exercises with datasets like weather.nominal.arff, weather.numeric.arff, and glass.arff, as well as instructions for using filters and evaluating classifier performance. Key findings include the effectiveness of the RandomTree algorithm over J48 and the impact of attribute reduction on classification accuracy.

Uploaded by

anhduong2005.gialai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views11 pages

Lab 01-Form-Submission Sample

Uploaded by

anhduong2005.gialai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

IT132 – Introduction to Data Mining

Week 1 – Tutorial

Class 1: Introduction to Weka

1.1. Introduction
Weka is an open-source software available at www.cs.waikato.ac.nz/ml/weka. Weka stands for the
Waikato Environment for Knowledge Analysis. It offers clean, spare implementation of the simplest
techniques, designed to aid understanding of the data mining techniques. It also provides a work-bench
that includes full, working, state-of-the-art implementations of many popular learning schemes that can
be used for practical data mining or for research.

In the first class, we are going to get started with Weka: exploring the “Explorer” interface, exploring
some datasets, building a classifier, using filters, and visualizing your dataset. (See the lecture of class 1
by Ian H. Witten, [1])

Task: Taking notes how you find the Explorer, and answering questions in the following sections

1.2. Exploring the Explorer

Follow the instructions in [1]

1.3. Exploring datasets

Follow the instructions in [1]

In dataset weather.nominal.arff, how many attributes are there in the relation? What are their values?
What is the class and its values? Counting instances for each attribute value.

Answer: There are 5 attributes in the relation, the specific values, the class and also its values are described in the
following table. Instances for each attribute value is counted for nominal data. Numeric data is summarized in
terms of min, max, mean and standard deviation.

1
weather.nominal.arff

Dataset Attributes Values #Instances

Relation: weather.symbolic outlook sunny 5
#Instances: 14 overcast 4
#Attributes: 5 rainy 5
Distinct: 3
temperature hot 4
mild 6
cool 4
Distinct: 3
humidity high 7
normal 7
Distinct: 2
windy TRUE 6
FALSE 8
Distinct: 2
Class play yes 9
no 5
Distinct: 2

Similarly, examine datasets: weather.numeric.arff and glass.arff.

weather.numeric.arff
Dataset Attributes Values #Instances
Relation: weather outlook sunny 5
#Instances: 14 overcast 4
#Attributes: 5 rainy 5
Distinct: 3
temperature Minimum: 64 Distinct: 12
Maximum: 85
Mean: 73.571
StdDev: 6.572
humidity Minimum: 65 Distinct: 10
Maximum: 96
Mean: 81.643
StdDev: 10.285

windy TRUE 6
FALSE 8
Distinct: 2
Class play yes 9
no 5
Distinct: 2

2
glass.arff
Dataset Attributes Values #Instances
Relation: Glass Rl Minimum: 1.511 Distinct: 178
#Instances: 214 Maximum: 1.534
#Attributes: 10 Mean: 1.518
StdDev: 0.003
Na Minimum: 10.73 Distinct: 142
Maximum: 17.38
Mean: 13.408
StdDev: 0.817
Mg Minimum: 0 Distinct: 94
Maximum: 4.49
Mean: 2.685
StdDev: 1.442
Al Minimum: 0.29 Distinct: 118
Maximum: 3.5
Mean: 1.445
StdDev: 0.499
Si Minimum: 69.81 Distinct: 133
Maximum: 75.41
Mean: 72.651
StdDev: 0.775
K Minimum: 0 Distinct: 65
Maximum: 6.21
Mean: 0.497
StdDev: 0.652
Ca Minimum: 5.43 Distinct: 143
Maximum: 16.19
Mean: 8.957
StdDev: 1.423
Ba Minimum: 0 Distinct: 34
Maximum: 3.15
Mean: 0.175
StdDev: 0.497
Fe Minimum: 0 Distinct: 32
Maximum: 0.51
Mean: 0.057
StdDev: 0.097

3
Type build wind float 70
build wind non-float 76
vehic wind float 17
vehic wind non-float 0
containers 13
tableware 9
headlamps 29
Distinct: 6

4
Create a file of ARFF format and examine it.

1.4. Building a classifier

Follow the instructions in [1]

Examine the output of J48 vs. RandomTree applied to dataset glass.arff

Algorithm Pruned/unpruned minNumObj Leaf size Correctly

Classified
Instances
J48 Unpruned 15 8 131

RandomTree N/A N/A N/A 150

Evaluate the confusion matrix every time running an algorithm.

J48 – unpruned – minNumObj=15:
=== Confusion Matrix ===

The algorithm is skewed towards classifying into a = build wind float, and b = build wind non-float.

5
RandomTree:
=== Confusion Matrix ===

The algorithm is skewed towards classifying into a = build wind float, and b = build wind non-float.
However, RandomTree provides better results than J48.

1.5. Using a filter

Follow the instructions in [1], and remark

_Use a filter to remove an attribute à

- What are attributeIndices?

Answer: Range of attributes to be acted upon by the filter.

_Remove instances where humidity is high à

6
- What are nominalIndices?
Answer: Range of label indices to be used for selection on nominal attribute.

_Fewer attributes, better classification:

Answer:
This is not true for all cases. If it is true, then it is highly possible that the removed attributes prove to be
no more than unnecessary complications to the model, or it is because the model cannot find the global
optimum by including those attributes. However, in cases where important attributes are removed (such
as attribute=size-measures to classify cats or tigers) then there will be major blows that deteriorate the
classification results. Either way, the notion that fewer attributes can lead to better classification requires
observations and experiments to confirm, it depends both on the model and the set of attributes.

7
Follow the instructions in [1], review the outputs of J48 applied to glass.arff:

Filter Leaf size Correctly Classified Remark

Instances
8 131 This is the state of classifier
Original built in section 1.4

Remove Fe 8 133 The model is incrementally

improved from the last, with
higher true positives, lower
false positives.

Remove all 7 142 The model showed good

attributes improvements from the last,
except RI and with higher true positives,
MG lower false positives.
Because the number of
attributes have been greatly
reduced, the number of
leaves decreased.
Interestingly, it appears that
the model works better with
only so few attributes, the
larger number must have
complicated the model.

1.6. Visualizing your data

Follow the instructions in [1], how do you find “Visualize classifier errors”?

– Answer: By right clicking a desired entry in the Result list.

8
After running J48 for iris.arff, determine:

- How many instances are predicted wrong?

Answer: 9 (given J48 classifier – unpruned – minNumObj=15).

- What are they?

Answer:

9
Instance Predicted class Actual class
63 Iris-versicolor Iris-virginica

80 Iris-versicolor Iris-virginica

92 Iris-versicolor Iris-virginica

109 Iris-versicolor Iris-virginica

123 Iris-versicolor Iris-virginica

98 Iris-versicolor Iris-setosa

73 Iris-virginica Iris-versicolor

10
105 Iris-virginica Iris-versicolor

119 Iris-virginica Iris-versicolor

NguyenCongSang ITITIU20292 Lab1
No ratings yet
NguyenCongSang ITITIU20292 Lab1
7 pages
Lab 01 - Nguyen Duy Phuc - ITDSIU21030
No ratings yet
Lab 01 - Nguyen Duy Phuc - ITDSIU21030
12 pages
NguyenThanhNam ITCSIU22311 Lab01
No ratings yet
NguyenThanhNam ITCSIU22311 Lab01
20 pages
Introduction to Weka Data Mining Lab
No ratings yet
Introduction to Weka Data Mining Lab
8 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
9 pages
Data Preprocessing and Classification Guide
No ratings yet
Data Preprocessing and Classification Guide
6 pages
Building a Classifier with J48 in WEKA
No ratings yet
Building a Classifier with J48 in WEKA
43 pages
WEKA Data Transformation Guide
No ratings yet
WEKA Data Transformation Guide
9 pages
Anjali Weka Software Report
No ratings yet
Anjali Weka Software Report
17 pages
Recent Trends in IT Practical Solutions
No ratings yet
Recent Trends in IT Practical Solutions
11 pages
Lab Record 10-15
No ratings yet
Lab Record 10-15
17 pages
Weka Book Questions
0% (1)
Weka Book Questions
2 pages
WEKA Lab Manual
100% (2)
WEKA Lab Manual
107 pages
Data Mining Lab Report - WEKA & Python
No ratings yet
Data Mining Lab Report - WEKA & Python
25 pages
Machine Learning Classification Techniques
No ratings yet
Machine Learning Classification Techniques
3 pages
Data Mining - Session #1 - Unlocked
No ratings yet
Data Mining - Session #1 - Unlocked
22 pages
Data Preprocessing & Decision Trees in R
No ratings yet
Data Preprocessing & Decision Trees in R
15 pages
DWM
No ratings yet
DWM
9 pages
Data Mining Lab Syllabus Overview
No ratings yet
Data Mining Lab Syllabus Overview
46 pages
Weka 3.6 Tutorial: Data Mining Guide
No ratings yet
Weka 3.6 Tutorial: Data Mining Guide
4 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
60 pages
Machine Learning Tech Talk
No ratings yet
Machine Learning Tech Talk
29 pages
Understanding Decision Trees Basics
No ratings yet
Understanding Decision Trees Basics
83 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Decision Trees for Classification Techniques
100% (1)
Decision Trees for Classification Techniques
62 pages
Data Warehouse Design and Analysis Guide
No ratings yet
Data Warehouse Design and Analysis Guide
4 pages
Decision Trees: Building and Analysis Guide
No ratings yet
Decision Trees: Building and Analysis Guide
49 pages
ML 3
No ratings yet
ML 3
24 pages
Weka Data Analysis and Visualization Guide
No ratings yet
Weka Data Analysis and Visualization Guide
21 pages
Supervised Learning: KNN & Decision Trees
No ratings yet
Supervised Learning: KNN & Decision Trees
72 pages
Decision Trees and Clustering in Weka
No ratings yet
Decision Trees and Clustering in Weka
4 pages
DLWSS551 - Algorithms Part I
No ratings yet
DLWSS551 - Algorithms Part I
59 pages
Random Forest Regression in Machine Learning
No ratings yet
Random Forest Regression in Machine Learning
57 pages
Data Mining Algorithms Explained
No ratings yet
Data Mining Algorithms Explained
123 pages
Lab Classification Data
No ratings yet
Lab Classification Data
3 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Introduction to Decision Trees Basics
No ratings yet
Introduction to Decision Trees Basics
75 pages
Feature Engineering Question Bank 251007 152820
No ratings yet
Feature Engineering Question Bank 251007 152820
11 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
71 pages
Data Transformations in Machine Learning
No ratings yet
Data Transformations in Machine Learning
69 pages
ML Lab Manual
No ratings yet
ML Lab Manual
43 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
3 pages
K-Means Cluster Analysis of Weather Data
No ratings yet
K-Means Cluster Analysis of Weather Data
10 pages
Lab Manual ML
No ratings yet
Lab Manual ML
23 pages
Lab3 NguyenQuocKhanh ITITIU18186
No ratings yet
Lab3 NguyenQuocKhanh ITITIU18186
7 pages
Decision Trees in Supervised Learning
No ratings yet
Decision Trees in Supervised Learning
63 pages
Understanding Decision Tree Algorithms
No ratings yet
Understanding Decision Tree Algorithms
18 pages
Decision Tree Assignment Guide
No ratings yet
Decision Tree Assignment Guide
8 pages
W2
No ratings yet
W2
33 pages
Comparative Study of Feature Selection Algorithms
No ratings yet
Comparative Study of Feature Selection Algorithms
10 pages
Data Mining Practical Guide
No ratings yet
Data Mining Practical Guide
27 pages
Weka Machine Learning Overview
No ratings yet
Weka Machine Learning Overview
22 pages
Weka Data Mining: Classification Overview
No ratings yet
Weka Data Mining: Classification Overview
47 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
12 pages
Data Warehousing Lab Manual
100% (1)
Data Warehousing Lab Manual
36 pages
Chapter 4 SqCzYr
No ratings yet
Chapter 4 SqCzYr
47 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
Homogeneous vs Heterogeneous Mixtures
No ratings yet
Homogeneous vs Heterogeneous Mixtures
3 pages
Fall Detection Using IMU Sensors
No ratings yet
Fall Detection Using IMU Sensors
18 pages
Classroom Monitoring Checklist: Department of Education
No ratings yet
Classroom Monitoring Checklist: Department of Education
4 pages
CB - AI - Unit 4
0% (1)
CB - AI - Unit 4
37 pages
Lab Week 10 Ecw341 (Jar Test) - Ec1105m - Group3
No ratings yet
Lab Week 10 Ecw341 (Jar Test) - Ec1105m - Group3
8 pages
Lab Report Experiment 1
No ratings yet
Lab Report Experiment 1
8 pages
Python Ludo Game Micro Project Report
No ratings yet
Python Ludo Game Micro Project Report
33 pages
International Science Coordination Debate
No ratings yet
International Science Coordination Debate
3 pages
Devoir Premiere Revised
No ratings yet
Devoir Premiere Revised
3 pages
Greater Effects by Performing A Small Number of Eccentric Contractions Daily Than A Larger Numbe2
No ratings yet
Greater Effects by Performing A Small Number of Eccentric Contractions Daily Than A Larger Numbe2
1 page
7 Matching Questions: No Answer Given
No ratings yet
7 Matching Questions: No Answer Given
5 pages
Gender Class Fi Cation Model
No ratings yet
Gender Class Fi Cation Model
10 pages
Proposal
No ratings yet
Proposal
4 pages
CG2H40025 25W
No ratings yet
CG2H40025 25W
14 pages
TCHR5001 A2 Brief
No ratings yet
TCHR5001 A2 Brief
12 pages
PDF - Ebook PDF 06 3 2025
No ratings yet
PDF - Ebook PDF 06 3 2025
10 pages
Machines Questions
No ratings yet
Machines Questions
3 pages
Exploring African Wilderness Dynamics
No ratings yet
Exploring African Wilderness Dynamics
6 pages
Haugan-Eriksson 2021 Chapter AnIntroductionToTheHealthPromo
No ratings yet
Haugan-Eriksson 2021 Chapter AnIntroductionToTheHealthPromo
13 pages
Motivation's Impact on English Skills in Kenya
No ratings yet
Motivation's Impact on English Skills in Kenya
57 pages
Gibong Integrated Watershed Management Plan 01
No ratings yet
Gibong Integrated Watershed Management Plan 01
42 pages
Phy101 - Note 2
No ratings yet
Phy101 - Note 2
9 pages
A1 Exam: Writing & Speaking Tasks
No ratings yet
A1 Exam: Writing & Speaking Tasks
4 pages
Grade 9 Population
100% (1)
Grade 9 Population
3 pages
Topic 1 - DJJ50203 Maintenance and Troubleshooting
No ratings yet
Topic 1 - DJJ50203 Maintenance and Troubleshooting
37 pages
Principles
No ratings yet
Principles
5 pages
Ethical Management and Social Responsibility
No ratings yet
Ethical Management and Social Responsibility
38 pages
B.Tech 4th Sem Marks Sheet
No ratings yet
B.Tech 4th Sem Marks Sheet
1 page
Berger - Semiotics and Society-Dual-Translated
No ratings yet
Berger - Semiotics and Society-Dual-Translated
12 pages
Lac Narrative Report
No ratings yet
Lac Narrative Report
2 pages

Lab 01-Form-Submission Sample

Uploaded by

Lab 01-Form-Submission Sample

Uploaded by

IT132 – Introduction to Data Mining

Class 1: Introduction to Weka

1.2. Exploring the Explorer

1.3. Exploring datasets

Dataset Attributes Values #Instances

Similarly, examine datasets: weather.numeric.arff and glass.arff.

1.4. Building a classifier

Examine the output of J48 vs. RandomTree applied to dataset glass.arff

Algorithm Pruned/unpruned minNumObj Leaf size Correctly

RandomTree N/A N/A N/A 150

Evaluate the confusion matrix every time running an algorithm.

1.5. Using a filter

_Use a filter to remove an attribute à

- What are attributeIndices?

_Remove instances where humidity is high à

_Fewer attributes, better classification:

Filter Leaf size Correctly Classified Remark

Remove Fe 8 133 The model is incrementally

Remove all 7 142 The model showed good

1.6. Visualizing your data

– Answer: By right clicking a desired entry in the Result list.

- How many instances are predicted wrong?

- What are they?

109 Iris-versicolor Iris-virginica

123 Iris-versicolor Iris-virginica

119 Iris-virginica Iris-versicolor

You might also like