WEKA Machine Learning Tutorials

The WEKA tutorials introduce machine learning techniques including decision trees, nearest neighbor classification, Naive Bayes, support vector machines, and association rule learning. Various datasets are used to apply and compare different classifiers, explore preprocessing methods like discretization and feature selection, and analyze text and market basket data with association rule mining. The tutorials provide hands-on exercises to help users learn how to use WEKA's interface and understand the outcomes of various machine learning algorithms.

Uploaded by

sandyguru05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

486 views5 pages

WEKA Machine Learning Tutorials

Uploaded by

sandyguru05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

WEKA tutorial exercises

These tutorial exercises introduce WEKA and ask you to try out several machine
learning, visualization, and preprocessing methods using a wide variety of datasets:
• Learners: decision tree learner (J48), instance-based learner (IBk), Naïve
Bayes (NB), Naïve Bayes Multinomial (NBM), support vector machine (SMO),
association rule learner (Apriori)
• Meta-learners: filtered classifier, attribute selected classifiers (CfsSubsetEval
and WrapperSubsetEval)
• Visualization: visualize datasets, decision trees, decision boundaries,
classification errors
• Preprocessing: remove attributes and instances, use supervised and
unsupervised discretization, select features, convert strings to word vectors
• Testing: on training set, on supplied test set, using cross-validation, using TP
and FP rates, ROC area, confidence and support of association rules
• Datasets: weather.nominal, iris, glass (with variants), vehicle (with variants),
kr-vs-kp, waveform-5000, generated, sick, vote, mushroom, letter, ReutersCorn-
Train and ReutersGrain-Train, supermarket

Tutorial 1: Introduction to the WEKA Explorer

Set up your environment and start the Explorer
Look at the Preprocess, Classify, and Visualize panels
In Preprocess:
• load a dataset (weather.nominal) and look at it
• use the Data Set Editor
• apply a filter (to remove attributes and instances).
In Visualize:
• load a dataset (iris) and visualize it
• examine instance info
• (note discrepancy in numbering between instance info and dataset viewer)
• select instances and rectangles; save the new dataset to a file.
In Classify:
• load a dataset (weather.nominal) and classify it with the J48 decision tree
learner (test on training set)
• examine the tree in the Classifier output panel
• visualize the tree (by right-clicking the entry in the result list)
• interpret classification accuracy and confusion matrix
• test the classifier on a supplied test set
• visualize classifier errors (by right-clicking the entry in the result list)
Answers to this tutorial are given.
Tutorial 2: Nearest neighbour learning and decision trees
Introduce the glass dataset, plus variants glass-minusatt, glass-withnoise, glass-mini-
normalized, glass-mini-train and glass-mini-test
• Explain how classifier accuracy is measured, and what is meant by class noise
and irrelevant attributes
Experiment with the IBk classifier for nearest neighbour learning:
• load glass data; list attribute names and identify the class attribute
• classify using IBk, testing with cross-validation
• repeat using 10 and then 20 nearest neighbours
• repeat all this for the glass-minusatt dataset
• repeat all this for the glass-withnoise dataset
• interpret the results and draw conclusions about IBk.
Perform nearest neighbour classification yourself:
• load glass-mini-normalized and view the data
• pretend that the last instance is a test instance and classify it (use the Visualize
panel to help)
• verify your answer by running IBk on glass-mini-train and glass-mini-test
Experiment with the J48 decision tree learner:
• load glass data and classify using J48
• visualize the tree and simulate its effect on a particular test instance
• visualize the classifier errors and interpret one of them
• note J48 classification accuracy on glass, glass-minusatt and glass-withnoise.
• interpret the results and draw conclusions about J48.
Compare nearest neighbour to decision tree learning:
• draw conclusions about relative performance of IBk and J48’s performance on
glass, glass-minusatt and glass-withnoise.

Tutorial 3: Naïve Bayes and support vector machines

Introduce the boundary visualizer tool
Introduce the datasets vehicle, kr-vs-kp, glass, waveform-5000 and generated.
Apply Naïve Bayes (NB) and J48 on several datasets:
• apply NB to vehicle, kr-vs-kp, glass, waveform-5000 and generated, using 10-
fold cross-validation.
• apply J48 to the same datasets.
• summarize the results
• draw an inference about the datasets where NB outperformed J48.
Investigate linear support vector machines:
• introduce the datasets glass, glass-RINa, vehicle and vehicle-sub
• apply a support vector machine learner (SMO) to glass-RINa, evaluating on the
training set
• apply the classification boundary visualizer, and also visualize the classification
errors (separately)
• describe the model built and explain the classification errors
• change SMO’s complexity parameter c option and repeat
• comment on the difference c makes.
Investigate linear and non-linear support vector machines:
• apply SMO to vehicle-sub, again evaluating on the training set
• apply the classification boundary visualizer, and visualize the classifier errors
• change the “exponent” option of the kernel “PolyKernel” from 1 to 2 and repeat
• explain the differences in the test results
• add/remove points in the boundary visualizer to change the decision boundary’s
shape.

Tutorial 4: Preprocessing
Introduce the datasets sick, vote, mushroom and letter.
Apply discretization:
• explain what discretization is
• load the sick dataset and look at the attributes
• classify using NB, evaluating with cross-validation
• apply the supervised discretization filter and look at the effect (in the Preprocess
panel)
• apply unsupervised discretization with different numbers of bins and look at the
effect
• use the FilteredClassifier with NB and supervised discretization, evaluating with
cross-validation
• repeat using unsupervised discretization with different numbers of bins
• compare and interpret the results.
Apply feature selection using CfsSubsetEval:
• explain what feature selection is
• load the mushroom dataset and apply J48, IBk and NB, evaluating with cross-
validation
• select attributes using CfsSubsetEval and GreedyStepwise search
• interpret the results
• use AttributeSelectedClassifier (with CfsSubsetEval and GreedyStepwise
search) for classifiers J48, IBk and NB, evaluating with cross-validation
• interpret the results.
Apply feature selection using WrapperSubsetEval:
• load the vote dataset and apply J48, IBk and NB, evaluating with cross-
validation
• select attributes using WrapperSubsetEval with InfoGainAttributeEval and
RankSearch, with the J48 classifier
• interpret the results
• use AttributeSelectedClassifier (with WrapperSubsetEval,
InfoGainAttributeEval and RankSearch) with classifiers J48, IBk and NB,
evaluating with cross-validation
• interpret the results.
Sampling a dataset:
• load the letter dataset and examine a particular (numeric) attribute
• apply the Resample filter to select half the dataset
• examine the same attribute and comment on the results.

Tutorial 5: Text mining

How to increase the memory size for Weka.
Introduce the datasets ReutersCorn-Train and ReutersGrain-Train.
Classify articles using binary attributes:
• load ReutersCorn-train
• apply StringToWordVector, with lower case tokens, alphabetic tokenizer, 2500
words to keep
• examine and interpret the result
• classify using NB and SMO, recording the TP and FP rates for positive
instances, and the ROC area
• interpret the results to compare the classifiers
• discuss whether TP or FP is likely to be more important for this problem
• use AttributeSelectedClassifier (with InfoGain and Ranker search, selecting 100
attributes) with the same classifiers
• look at the words that have been retained, and comment
• compare the results for classification with and without attribute selection
Classify articles using word count attributes:
• load ReutersCorn-train
• apply StringToWordVector, with lower case tokens, alphabetic tokenizer, 2500
words to keep, and wordCount set to true
• examine and interpret the results
• classify using Naïve Bayes Multinomial (NBM) and SMO, recording the same
figures as above
• compare the results with those above for binary attributes
• undo StringToWordVector and reapply with wordCount set to false
• reclassify with AttributeSelectedClassifier (with InfoGain and Ranker search)
using NB and SMO, with 100, 50, 25 attributes
• compare NB with and without attribute selection, and the same for SMO
• compare NB with binary attributes against NBM with word count attributes, and
the same for SMO
Classify unknown instances:
• use NBM models built from ReutersCorn-train and ReutersGrain-train to
classify a mystery instance (Mystery1)
• repeat using SMO models
• comment on the findings
• use the same NBM models to classify a second mystery instance (Mystery2).
Tutorial 6: Association rules
Introduce the datasets vote, weather.nominal and supermarket.
Apply an association rule learner (Apriori):
• load vote, go to the Associate panel, and apply the Apriori learner
• discuss the meaning of the rules
• find out how a rule’s confidence is computed
• identify the “support” and “number of instances predicted correctly” of certain
rules
• change the number of rules in the output
• what is the criterion for “best rules”?
• find rules that mean certain things
Finding association rules manually:
• load weather.nominal and look at the data
• find the support and confidence for a certain rule
• consider rules with multiple parts in the consequent
Make association rules for the supermarket dataset:
• load supermarket
• generate 30 association rules and discuss some inferences you would make from
them

Weka 3.6 Tutorial: Data Mining Guide
No ratings yet
Weka 3.6 Tutorial: Data Mining Guide
4 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
6.034 Design Assignment 2: 1 Data Sets
No ratings yet
6.034 Design Assignment 2: 1 Data Sets
6 pages
WEKA Installation & Usage Guide
No ratings yet
WEKA Installation & Usage Guide
11 pages
Introduction to WEKA: Features & Usage
No ratings yet
Introduction to WEKA: Features & Usage
51 pages
Machine Learning Crash Course: Computer Vision James Hays
No ratings yet
Machine Learning Crash Course: Computer Vision James Hays
38 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
We Ka Manual
No ratings yet
We Ka Manual
327 pages
WEKA Manual For Version 3-6-1
No ratings yet
WEKA Manual For Version 3-6-1
212 pages
We Ka Manual
No ratings yet
We Ka Manual
303 pages
WEKA 3-7-7 Manual: Command-Line & GUI
No ratings yet
WEKA 3-7-7 Manual: Command-Line & GUI
327 pages
WEKA Manual for Data Scientists
0% (1)
WEKA Manual for Data Scientists
303 pages
WEKA Manual For Version 3-7-5
No ratings yet
WEKA Manual For Version 3-7-5
327 pages
Neal Zhang
No ratings yet
Neal Zhang
33 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
WEKA Command-Line and GUI Manual
No ratings yet
WEKA Command-Line and GUI Manual
303 pages
DA LabFile
No ratings yet
DA LabFile
63 pages
DWM1
No ratings yet
DWM1
19 pages
WEKA Manual For Version 3-6-15
No ratings yet
WEKA Manual For Version 3-6-15
303 pages
WEKA Manual For Version 3-6-5
No ratings yet
WEKA Manual For Version 3-6-5
303 pages
Tu3 Weka Tutorials
No ratings yet
Tu3 Weka Tutorials
11 pages
V3I4201499b50 PDF
No ratings yet
V3I4201499b50 PDF
8 pages
Embedded Methods: Isabelle Guyon André Elisseeff
No ratings yet
Embedded Methods: Isabelle Guyon André Elisseeff
12 pages
Deep Learning for Mental Illness Prediction
No ratings yet
Deep Learning for Mental Illness Prediction
58 pages
BigData Week13
No ratings yet
BigData Week13
62 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Presented by Pratap Solapur Under The Guidance of Prof. P. B. Patil
No ratings yet
Presented by Pratap Solapur Under The Guidance of Prof. P. B. Patil
20 pages
Python ML Algorithm
No ratings yet
Python ML Algorithm
30 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
7 pages
Machine Learning Wheat Analysis
No ratings yet
Machine Learning Wheat Analysis
8 pages
Machine Learning Classifiers Overview
No ratings yet
Machine Learning Classifiers Overview
46 pages
Unit 6 DWDM
No ratings yet
Unit 6 DWDM
74 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Content-Based Recommendation Systems Explained
No ratings yet
Content-Based Recommendation Systems Explained
40 pages
Weka Manual
No ratings yet
Weka Manual
343 pages
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
No ratings yet
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
40 pages
Analysis of Machine Learning Algorithms Using WEKA: Aaditya Desai Dr. Sunil Rai
No ratings yet
Analysis of Machine Learning Algorithms Using WEKA: Aaditya Desai Dr. Sunil Rai
6 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
92 pages
Data Warehousing Lab Guide
No ratings yet
Data Warehousing Lab Guide
55 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
Machine Learning Algorithms 1728923216
No ratings yet
Machine Learning Algorithms 1728923216
12 pages
R Course - Part7 ML - Exercise Sheet 2024
No ratings yet
R Course - Part7 ML - Exercise Sheet 2024
8 pages
Bayes Classifiers in Machine Learning Lab
No ratings yet
Bayes Classifiers in Machine Learning Lab
10 pages
WEKA Classification and Clustering Analysis
No ratings yet
WEKA Classification and Clustering Analysis
14 pages
3 Dtrees-Lect6
No ratings yet
3 Dtrees-Lect6
63 pages
Topology Design Through Evolution
No ratings yet
Topology Design Through Evolution
7 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
Machine Learning Project 1
No ratings yet
Machine Learning Project 1
19 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
Amlt Bca Unit-1
No ratings yet
Amlt Bca Unit-1
24 pages
Recent Trends in IT Practical Solutions
No ratings yet
Recent Trends in IT Practical Solutions
11 pages
AMAN
No ratings yet
AMAN
51 pages
AAM Book
No ratings yet
AAM Book
159 pages
Introduction to Weka Tool and Features
No ratings yet
Introduction to Weka Tool and Features
38 pages
SQL Exercises for YADKIN Database
No ratings yet
SQL Exercises for YADKIN Database
2 pages
Current Affairs: January 2010
100% (2)
Current Affairs: January 2010
46 pages
Social Responsibilities of Management
No ratings yet
Social Responsibilities of Management
2 pages
GNBExam 01
No ratings yet
GNBExam 01
4 pages
E-Book Sensory Evaluation
No ratings yet
E-Book Sensory Evaluation
65 pages
Lancelot Ou Le Chevalier de La Charrette Dissertation
100% (1)
Lancelot Ou Le Chevalier de La Charrette Dissertation
6 pages
IAEA Quality Assurance in Nuclear Medicine
No ratings yet
IAEA Quality Assurance in Nuclear Medicine
10 pages
Survey and Research Methodology: Qazi Azizul Mowla
No ratings yet
Survey and Research Methodology: Qazi Azizul Mowla
25 pages
Nikunj Sharma Ctif Assignment 2
No ratings yet
Nikunj Sharma Ctif Assignment 2
14 pages
Research 2018
No ratings yet
Research 2018
93 pages
Grammatical Skills and English Proficiency
No ratings yet
Grammatical Skills and English Proficiency
25 pages
Ethnography - Notes
No ratings yet
Ethnography - Notes
3 pages
PM 1
No ratings yet
PM 1
9 pages
UNESCO Bangkok Volunteer Form
No ratings yet
UNESCO Bangkok Volunteer Form
5 pages
CSTP 5: Assessing Students For Learning: Emerging Exploring Applying Integrating Innovating
No ratings yet
CSTP 5: Assessing Students For Learning: Emerging Exploring Applying Integrating Innovating
8 pages
IEStat2 - Laboratory 4 - de Jesus
No ratings yet
IEStat2 - Laboratory 4 - de Jesus
12 pages
Lean & Process Improvement at XYZ Bank
No ratings yet
Lean & Process Improvement at XYZ Bank
5 pages
WEF Future of Jobs 2018 PDF
No ratings yet
WEF Future of Jobs 2018 PDF
147 pages
Give The Significance of The Study of Human Behavior. Cite and Discuss Some Assumptions, Concepts, Principles and Models of Human Behavior
No ratings yet
Give The Significance of The Study of Human Behavior. Cite and Discuss Some Assumptions, Concepts, Principles and Models of Human Behavior
2 pages
SPT 2024: Call for Papers
No ratings yet
SPT 2024: Call for Papers
2 pages
Reflection On Recognizing and Treating IV Infiltration
No ratings yet
Reflection On Recognizing and Treating IV Infiltration
8 pages
02 Project Management Glossary of Terms - PDF (PMP LEXICON)
No ratings yet
02 Project Management Glossary of Terms - PDF (PMP LEXICON)
32 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Creation in Pairs
No ratings yet
Creation in Pairs
3 pages
Using Llms For Market Research: James Brand Ayelet Israeli Donald Ngwe
No ratings yet
Using Llms For Market Research: James Brand Ayelet Israeli Donald Ngwe
48 pages
Problem Based Learning. GMRC
No ratings yet
Problem Based Learning. GMRC
22 pages
10 1007BF00805437
No ratings yet
10 1007BF00805437
5 pages
Step-By-step WAIS IV Interpretation
100% (12)
Step-By-step WAIS IV Interpretation
4 pages
Communication and Media Dynamics
No ratings yet
Communication and Media Dynamics
562 pages
Accenture Next Era Commercialization Point View May2023
No ratings yet
Accenture Next Era Commercialization Point View May2023
30 pages
Theater Pantomime REFLECTIONS
No ratings yet
Theater Pantomime REFLECTIONS
3 pages
LAST FINAL - Odt
No ratings yet
LAST FINAL - Odt
45 pages
Sto. Tomas Senior High School: Department of Education
No ratings yet
Sto. Tomas Senior High School: Department of Education
69 pages