0% found this document useful (0 votes)
50 views61 pages

Komal DWDM 1to5

The document is a lab manual for a Data Warehousing and Data Mining course, detailing various experiments and tools used in the field. It covers hands-on exploration of data mining tools like WEKA, RapidMiner, and Talend, along with practical applications of data preprocessing, classification, and clustering algorithms. Each experiment includes objectives, methodologies, and expected outcomes to facilitate learning and application of data science techniques.

Uploaded by

komalbuddhdev74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views61 pages

Komal DWDM 1to5

The document is a lab manual for a Data Warehousing and Data Mining course, detailing various experiments and tools used in the field. It covers hands-on exploration of data mining tools like WEKA, RapidMiner, and Talend, along with practical applications of data preprocessing, classification, and clustering algorithms. Each experiment includes objectives, methodologies, and expected outcomes to facilitate learning and application of data science techniques.

Uploaded by

komalbuddhdev74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

DATA WAREHOUSING

&
DATA MINING
(01CE0723)

Lab Manual

A.Y. 2025-26

Name : Komal Budhhdev


Er. No. : 92310103021
Semester: 7
Class : TC2
Batch : A
INDEX
Sr. Plan Actual
Experiments Marks Signature
No. Date Date

Explore data mining and data warehousing


1.
tools.

Explore Weka modules: Explorer,


Experimenter, KnowledgeFlow, Workbench,
2.
Simple CLI. Exploring Explorer module with
.csv and .arff files.

Prepare and analyse “student” dataset, also


analyse “student”, “weather.nominal” and
3.
“iris” dataset along with editing and
visualization.

Apply Preprocessing techniques on dataset


using filters: Remove, ReplaceMissingValues,
ReplaceMissingWithUserConstant,
4.
ReplaceWithMissingValue, Descritize.
Also do the result analysis before and after
preprocessing.

Apply Preprocessing techniques on dataset


using filters:
5. NumericToNominal, StringToNominal,
NominalToBinary, Normalize.Also do the result
analysis before and after preprocessing.

Demonstration on APRIORI algorithm along with


6. frequent item sets, non-frequent item sets and
stron & weak association rules.

Apply APRIORI algorithm on


7. “weather.nominal” dataset and analyze the
results.

Demonstration on “J48”, “RandomForest” and


8. “NaiveBayes” classification algorithms using
test options.

Apply and analyze “J48”, “RandomForest” and


9. “NaiveBayes” classification algorithms on
“weather.nominal” dataset and compare the
results.

Demonstration on prediction algorithms


“NaiveBayes” and “Logistic” by creating
10.
classification model and “Supplied Test Set”
options.

Apply prediction “NaiveBayes” and “Logistic”


by creating classification model and
11.
“Supplied Test Set” options on any suitable
dataset and compare the results.

Demonstration on “SimpleKMeans” clustering


12.
algorithm using “EuclideanDistance”.

Apply and analyze “SimpleKMeans” clustering


algorithm on suitable dataset, with the
13. observation of “maxIterations” and
“numClusters” parameters along with
visualization.

Case study on applications of Data Mining tools


14. and techniques used for Business Intelligence.
Experiment List
Sr.
Title CO
No.
1. Explore data mining and data warehousing tools. CO1, CO2

Explore Weka modules: Explorer, Experimenter, KnowledgeFlow, CO2, CO3,


2.
Workbench, Simple CLI. Exploring Explorer module with .csv and .arff files. CO4, CO5

Prepare and analyse “student” dataset, also analyse “student”,


3. CO2
“weather.nominal” and “iris” dataset along with editing and visualization.

Apply Preprocessing techniques on dataset using filters:


Remove,ReplaceMissingValues,ReplaceMissingWithUserConstant,
4. CO3
ReplaceWithMissingValue, Descritize.
Also do the result analysis before and after preprocessing.

Apply Preprocessing techniques on dataset using filters:


5. NumericToNominal, StringToNominal, NominalToBinary, Normalize. CO3
Also do the result analysis before and after preprocessing.

Demonstration on APRIORI algorithm along with frequent item sets,


6. CO4
non-frequent item sets and stron & weak association rules.

Apply APRIORI algorithm on “weather.nominal” dataset and analyze the


7. CO4
results.

Demonstration on “J48”, “RandomForest” and “NaiveBayes” classification


8. CO5
algorithms using test options.

Apply and analyze “J48”, “RandomForest” and “NaiveBayes” classification


9. CO5
algorithms on “weather.nominal” dataset and compare the results.

Demonstration on prediction algorithms “NaiveBayes” and “Logistic” by


10. CO5
creating classification model and “Supplied Test Set” options.
Apply prediction “NaiveBayes” and “Logistic” by creating classification
11. model and “Supplied Test Set” options on any suitable dataset and CO5
compare the results.
Demonstration on “SimpleKMeans” clustering algorithm using
12. CO5
“EuclideanDistance”.
Apply and analyze “SimpleKMeans” clustering algorithm on suitable
13. dataset, with the observation of “maxIterations” and “numClusters” CO5
parameters along with visualization.

Case study on applications of Data Mining tools and techniques used for CO2, CO3,
14.
Business Intelligence. CO4, CO5
Experiment 1

Title: Explore data mining and data warehousing tools.

List of Tools Explored for Data Mining & Data Warehousing:


1) WEKA (Data Mining)
2)RapidMiner (Data Mining)
3)Talend (Data Warehousing + Data Integration)

1. WEKA
∙ Introduction:
WEKA (Waikato Environment for Knowledge Analysis) is an open-source software for data mining
tasks. It contains a collection of visualization tools and algorithms for data analysis and predictive
modeling, together with graphical user interfaces for easy access to these functions.
∙ Detailed Description with Screenshots of Various Features:

Komal Buddhdev (92310103021) |1


Preprocessing:
WEKA allows importing datasets in ARFF, CSV, or other formats. We can clean, filter, and
normalize your data before applying algorithms.
Classification:
Offers several machine learning algorithms such as J48 (C4.5), Naive Bayes, Random Forest, etc.,
which can be easily applied to your dataset.
Clustering:
Includes algorithms like k-means and EM clustering. Clustering results can be visualized graphically.
Visualization:
WEKA includes built-in visualization tools to analyze attributes and outputs graphically.

∙ Official Website of Tool:


https://www.cs.waikato.ac.nz/ml/weka/

Tool 2:RapidMiner
● Introduction:
RapidMiner is a powerful, open-source data science platform designed for data preparation,
machine learning, deep learning, and text mining. It has a drag-and-drop interface and
supports a wide range of data science tasks.

● Detailed Description with Screenshots of Various Features:

Drag & Drop Workflow Designer:


Create models visually by connecting blocks.
Built-in Machine Learning Algorithms:
Over 1500 functions and algorithms for classification, clustering, and regression.
Data Preprocessing Tools:
Data cleaning, normalization, handling missing values, etc.

Komal Buddhdev (92310103021) |2


● Official Website of Tool:
https://www.rapidminer.com

Tool 3: Talend Open Studio

● Introduction:
Talend is an open-source data integration and data warehousing tool that enables data
transformation, migration, and synchronization. It is widely used for big data integration and
ETL processes.
● Detailed Description with Screenshots of Various Features:

ETL Job Design:


Talend provides a graphical interface for designing ETL jobs using components from a palette.
Big Data Integration:
Integrates with Hadoop, Spark, Hive, etc.
Connectors:
Hundreds of connectors to connect databases, cloud platforms, APIs, etc.

Komal Buddhdev (92310103021) |3


● Official Website of Tool:
https://www.talend.com

Comparison of all the tools:

Feature WEKA RapidMiner Talend Open Studio

End-to-End Data Science (incl. Data Integration, ETL, and


Primary Purpose Data Mining & Machine Learning
ML & Deep Learning) Warehousing

Graphical (Drag & Drop Graphical (ETL Job Designer with


User Interface GUI + Command Line
Workflow Designer) Drag & Drop)

Very easy to use with intuitive Moderate; suitable for technical


Ease of Use Beginner-friendly for ML models
design users

Supported Many classic ML algorithms: J48, 1500+ operators: ML, DL, text No ML algorithms; focuses on
Algorithms Naive Bayes, Random Forest mining, sentiment analysis data flows and transformations

Preprocessing Advanced preprocessing Data transformation, mapping,


Basic filters and transformations
Tools (missing values, outliers, etc.) cleansing via components

Interactive charts, correlation Limited (mainly data flow and


Visualization Basic graphs, attribute plots
matrices, model insights schema-based visualization)

Data Format CSV, Excel, DB, JSON, XML, Hundreds of formats; strong
ARFF, CSV, C4.5
Support BigQuery, Hadoop, etc. database and API support

Komal Buddhdev (92310103021) |4


Feature WEKA RapidMiner Talend Open Studio

Moderate (via extensions or Extensive (many connectors and


Data Integration Limited
scripting) native integration options)

Scripting Java-based logic, custom scripts


Java Java, Python, R support
Language possible

Desktop & Cloud-enabled (via


Platform Desktop only Desktop & Cloud options
Talend Cloud)

Community Edition is free, Yes, Open Studio is free;


Open Source Yes
Enterprise is paid Enterprise version available

Yes (integrates with Hadoop,


Big Data Support No Limited in free version
Spark, Hive, etc.)

Students, researchers learning Analysts, data scientists Data engineers performing


Ideal For
ML building complex ML pipelines large-scale ETL & integration

Experiment Outcome:

This experiment provided hands-on exposure to three powerful tools—WEKA, RapidMiner, and
Talend—highlighting their strengths in data mining and warehousing. WEKA and RapidMiner
proved effective for data analysis and predictive modeling, while Talend showcased robust
capabilities for data integration and ETL tasks. Each tool serves a unique purpose, and
understanding their features enables more informed tool selection for real-world data science
applications.

Komal Buddhdev (92310103021) |5


Experiment 2

Title: Explore Weka modules: Explorer, Experimenter, KnowledgeFlow,


Workbench, Simple CLI.

WEKA History & Introduction:

Introduction to WEKA
WEKA stands for Waikato Environment for Knowledge Analysis. It is a powerful, open-source
suite of machine learning software developed to facilitate data mining and analysis tasks.
WEKA provides tools for:
• Data pre-processing
• Classification
• Regression
• Clustering
• Association rules
• Visualization
WEKA is written in Java and offers a Graphical User Interface (GUI) as well as a command-line
interface. It is widely used for teaching, research, and practical machine learning applications.

History of WEKA
1. 1992 – Project Initiation
WEKA was initiated at the University of Waikato in Hamilton, New Zealand. The original
aim was to create a tool that supported machine learning algorithms and made them easily
accessible to non-programmers.
2. Early Development
Initially, WEKA was developed as a closed-source project that focused on algorithms for
analyzing agricultural data.
3. 1997 – Open Source Release
The project was restarted from scratch in 1997 and released as open-source software under
the GNU General Public License (GPL). This made it freely available for public use and
significantly increased its popularity.
4. Rapid Growth and Popularity
After becoming open source, WEKA grew rapidly as a popular tool in the data mining and
machine learning community. It became one of the primary educational tools for students
and researchers to learn and experiment with machine learning concepts.
5. Development Contributions
WEKA was primarily developed by the Machine Learning Group at the University of
Waikato, but it also attracted contributions from the global research community.
6. Widespread Adoption
WEKA gained widespread adoption due to its:
o Simple user interface
o Easy accessibility of machine learning algorithms

Komal Buddhdev (92310103021) |6


Compatibility with multiple data formats (like ARFF, CSV, etc.) 7. Awards and
o
Recognition
WEKA has won several awards, including the SIGKDD Data Mining and Knowledge
Discovery Service Award in 2005 for its significant contribution to the data mining
community.

Key Features of WEKA


• GUI for easy interaction
• Collection of machine learning algorithms
• Data visualization and exploration tools
• Support for batch processing and scripting
• Platform-independent (runs on any system with Java)
• Integration capability with other applications via Java API

WEKA Applications:

1) Disease Prediction
2) Market Basket Analysis
3) Credit Scoring and Risk Assessment
4) Spam Detection
5) Crop Yield Prediction

Modules in WEKA:
1. Explorer
2. Experimenter
3. KnowledgeFlow
4. Workbench
5. Simple CLI

Module 1: Explorer

● Purpose of the Module


The Explorer module in WEKA is the main user interface that provides a graphical
environment to apply data preprocessing, visualization, classification, clustering,
association, and evaluation of machine learning algorithms.
It is designed for easy exploration and experimentation with datasets without
needing to write code.
● Screenshots with description
This is the Preprocess Tab of the WEKA Explorer, which is the first interface you
interact with after launching WEKA. It allows users to:
• Load datasets using options like Open file..., Open URL..., or Open DB....
• Apply filters for data preprocessing.
• View dataset information including relation name, number of instances, and
number of attributes.
• Select or remove attributes using selection buttons like All, None, Invert, and
Pattern.

Komal Buddhdev (92310103021) |7


• Visualize the dataset using the Visualize All button.
The Preprocess tab is essential for data preparation, cleaning, and selection before
applying machine learning algorithms.

Figure 1: WEKA Explorer

Komal Buddhdev (92310103021) |8


Figure 1.1 : File Selection Window in WEKA Explorer

Description:
This figure shows the file selection dialog box that appears when the user clicks on the
Open file... button in the Preprocess tab of the WEKA Explorer.
It allows the user to:
• Browse the local system to locate and select dataset files (typically in .arff format).
• The dialog box displays folders and files from the system’s directories.
• The bottom panel shows the File Name input field and file type filter, which is set to
accept only ARFF data files by default.
This step is essential for loading datasets into WEKA for further analysis and processing.

• Applications of the module


• Loading Datasets
• Data Cleaning
• Data Transformation
• Dataset Visualization
• Preparing Data for Modeling

● Applications of the module

∙ Importing datasets (ARFF, CSV, DB)

∙ Clean/filter/transform data

Komal Buddhdev (92310103021) |9


∙ Run classifiers like J48, NB, RF

∙ Perform clustering (k-means, EM)

∙ Visual plot analysis in Visualize tab

Module 2: Experimenter

● Purpose of the Module


The Experimenter module in WEKA is used to run multiple experiments
systematically. It helps compare different machine learning algorithms on various
datasets by automating tests and producing detailed performance results. This makes
it easier to evaluate which algorithm works best for a specific problem.

● Screenshots with description


Description:
The image shows the WEKA Experiment Environment, specifically the Setup tab, which is
used to configure machine learning experiments. Below are the key components:
1. Experiment Configuration Mode
Drop-down to choose between Simple or Advanced configuration.
2. File Operations
Open, Save, and New: Manage experiment files.
3. Results Destination
Select where to save the results (ARFF, CSV, etc.), and browse to select the filename.
4. Experiment Type
Choose the evaluation method (e.g., Cross-validation).
Enter the Number of folds (e.g., 10-fold CV).
Choose between Classification or Regression.
5. Iteration Control
Enter Number of repetitions.
Choose whether to iterate with Datasets first or Algorithms first.
6. Datasets Section
Add, edit, or delete datasets for the experiment.
Option to use relative file paths.
7. Algorithms Section
Add, edit, or delete machine learning algorithms to be tested.

Komal Buddhdev (92310103021) | 10


Figure 2 : WEKA Experiment Environment – Setup Tab

● Applications of the module


1. Algorithm Evaluation and Comparison
2. Performance Benchmarking
3. Machine Learning Research
4. Educational and Teaching Tool
5. Automated Experimentation on Multiple Datasets

Module 3: KnowledgeFlow

● Purpose of the Module


Visual workflow interface for batch and incremental data processing—think of it as data
"pipelines" with drag-and-drop components.
● Screenshots with description

Komal Buddhdev (92310103021) | 11


Figure 3: WEKA KnowledgeFlow Environment

Description:
This figure shows the WEKA KnowledgeFlow Environment, a graphical interface that
allows users to build and visualize machine learning workflows. Unlike the command-line
or Explorer interface, KnowledgeFlow uses a drag-and-drop design where components
such as data sources, filters, classifiers, and evaluators are added and connected visually.
Key elements include:
• Design Panel (Left): Contains various components organized into categories.
• Toolbar (Top): Provides tools for saving, running, and editing workflows.
• Status/Log Panel (Bottom): Displays messages, logs, and execution status.

• Design Panel (Left): Contains various components organized into categories (e.g.,
DataSources, Filters, Classifiers).
• Workflow Area (Center): Users can design experiments by placing and linking
components.
• Toolbar (Top): Provides tools for saving, running, and editing workflows.
• Status/Log Panel (Bottom): Displays messages, logs, and execution status.

• Applications of the module


• Visual Workflow Design for Machine Learning
• Real-time Data Processing and Monitoring
• Customizable Machine Learning Pipelines
• Teaching and Demonstration of Data Flow Concepts
• Batch Execution of Data Mining Tasks

● Applications of the module


Komal Buddhdev (92310103021) | 12
● Create pipelines by connecting sources, filters, classifiers, evaluators
● Handle real-time (incremental) data using updateable classifiers
● Visualize streaming results through plotting components

Module 4: Workbench

● Purpose of the Module


The WEKA Workbench is designed to provide a user-friendly platform for performing
various machine learning and data mining tasks. It integrates tools for data preprocessing,
classification, clustering, regression, and visualization. With its graphical interface, users
can easily load datasets, apply algorithms, evaluate model performance, and visualize
results. It supports both beginners and advanced users through multiple interfaces like
Explorer, Experimenter, KnowledgeFlow, and Simple CLI, making it a complete environment
for end-toend data analysis.

● Screenshots with description


Description:
This figure shows the Preprocess Panel of the WEKA Workbench, which is the initial step in
the data mining process. It is used to load, explore, and preprocess datasets before applying
any machine learning algorithms. Key features include:
• File Loading Options: Load datasets from files, URLs, or databases.
• Filter Section: Apply filters to preprocess data (e.g., normalization, missing value
handling).
• Attribute Summary: Displays metadata of selected attributes such as name, type,
weight, and missing values.
• Attribute Selection Buttons: Select all, none, invert selection, or use pattern matching.
• Visualization Option: Allows visualization of selected or all attributes for better
understanding of data.

Figure 4: WEKA Workbench – Preprocess Panel

Komal Buddhdev (92310103021) | 13


● Applications of the module
1) Data Loading and Exploration
2) Data Cleaning and Transformation
3) Attribute Selection and Filtering
4) Handling Missing or Noisy Data
5) Data Visualization for Analysis

Module 5: Simple CLI

● Purpose of the Module


The Simple CLI in WEKA allows users to interact with the software using text-based
commands instead of the graphical interface. Its main purpose is to provide a flexible
and powerful way to execute tasks quickly, automate processes, and run batch
experiments. It is especially useful for advanced users who want more control over
model building, evaluation, or scripting without relying on the GUI.

● Screenshots with description

Figure 5: WEKA Simple CLI Interface

1. This screenshot shows the Simple CLI window of WEKA. Key features include:
2. A command input line to enter WEKA class and method calls
3. Commands follow Java class names for classifiers, filters, and other functions
4. Tab-completion support for easier typing of class names
5. Displays real-time output of the model training, testing, or file processing
6. It gives access to all core WEKA functionality including preprocessing, model training,
evaluation, and saving results — all via code, without using the GUI.

Komal Buddhdev (92310103021) | 14


● Applications of the module
1) Automating Data Mining Tasks
2) Batch Processing of Large Datasets
3) Script-Based Experimentation
4) Quick Execution of Classifiers and Filters
5) Integration with External Tools or Shell Scripts

Experiment Outcome:

By exploring the five main WEKA modules — Explorer, Experimenter, KnowledgeFlow,


Workbench, and Simple CLI — users gain a comprehensive understanding of data mining
and machine learning workflows. They develop practical skills in data preprocessing, model
training and evaluation, experimental comparison of algorithms, graphical workflow
design, and command-line operations. This hands-on experience enhances their ability to
select suitable tools for different data analysis tasks and prepares them for real-world data
science projects using WEKA.

Komal Buddhdev (92310103021) | 15


Experiment 3

Title: Prepare and analyse “student” dataset, also analyse “student”,


“weather.nominal” and “iris” dataset along with editing and visualization.

• File formats and data types supported by WEKA

1. File formats supported by WEKA


2. Datatypes supported by WEKA

• Preparation and analysis of “student.arff” dataset

Dataset Code

@relation student
@attribute Name string
@attribute Age numeric
@attribute Gender {Male, Female}
@attribute Bdate date "yyyy-MM-dd"
@attribute Email string
@attribute City {Rajkot, Ahmedabad, Jamnagar, Gondal}
@attribute Married {Yes, No}
@attribute Address string
@attribute Mobile numeric
@attribute Backlog {Yes , No}
@data
"Win",21,Female,"2003-05-31","[email protected]","Rajkot",No,"Marwadi
University",7042159221,No
"Yadanar",22,Female,"2002-05-31","[email protected]","Ahmedabad",No,"Parul
University",7042161665,Yes
"Phyo",20,Male,"2001-03-31","[email protected]","Rajkot",Yes,"Taungoo
University",250400742,No
"Kaung",23,Male,"2000-02-14","[email protected]","Jamnagar",No,"Yangon
University",7042159222,Yes
"Si",25,Female,"2003-12-31","[email protected]","Rajkot",No,"Marwadi
University",7042159221,No
"Yoon",14,Female,"2012-05-31","[email protected]","Gondal",Yes,"Yeni
University",263836707, Yes
"Yu",26,Male,"2005-04-22","[email protected]","Ahmedabad",No,"Marwadi
University",7042159121,No
"WinWin",21,Female,"2003-05-31","[email protected]","Rajkot",No,"Dagon
University",250400742 ,Yes
"Chit",22,Male,"2003-05-31","[email protected]","Rajkot",No,"Marwadi
University",7042159221,No
Komal Buddhdev (92310103021) | 16
Analysis of “student.arff” with weka

1) Age Attribute

Figure 3.1.1: Age Attribute


Description:
The Age attribute in the student dataset is a numeric type with no missing values. It contains
data for 9 instances, with 7 distinct values, and 5 unique values (56% uniqueness). The
minimum age recorded is 14, while the maximum is 26. The mean (average) age of students is
21.556, and the standard deviation is 3.432, indicating a moderate spread of age values
around the mean. The histogram at the bottom visually displays how age is distributed across
the two classes of the Backlog attribute (shown in red and blue). Most students are
concentrated in the older age group, and the class distribution suggests that both categories
(Backlog = Yes/No) appear more frequently among the higher age values.

2) Gender Attribute

Description:
The Gender attribute is of nominal type with two distinct values: Male and Female. There are
no missing values, and the dataset contains 4 male and 5 female instances. The histogram
shows that both genders are almost equally represented. The class distribution (Backlog =
Yes/No) appears in both groups, indicating that gender does not show a strong imbalance or
clear pattern related to backlog status in this small sample.

Komal Buddhdev (92310103021) | 17


Figure 3.1.2: Gender Attribute

3) Bdate Attribute

Figure 3.1.3: Bdate Attribute


Komal Buddhdev (92310103021) | 18
Description:
The Bdate attribute is of date type and contains no missing values. There are 7 distinct
values out of 9 instances, with 67% uniqueness. The earliest birthdate is 2000-02-14 and the
latest is 2012-05-31. The mean birthdate is around 2003-12-21, and the standard deviation
indicates a wide spread in birth years.The histogram shows that most students are
concentrated in a similar age group, as 8 out of 9 instances share a closely grouped birthdate
range. There is one outlier, suggesting a much older or younger student. The Backlog class
distribution (red and blue) is mixed across the date range.
4) Email Attribute

Figure 3.1.4: Email Attribute

Description:

The Email attribute is of string type, with no missing values. Out of 9 instances, there are 8
distinct entries, and 7 of them are unique, making up 78% uniqueness. Since it is a string
attribute, Weka does not perform standard numeric or nominal statistical analysis. Also, it
cannot be visualized in the standard plot window, as shown in the message: "Attribute is neither
numeric nor nominal."This attribute is typically used for identification or reference purposes
and doesn't contribute directly to model training unless processed or encoded into a usable
form.

5) City Attribute

Komal Buddhdev (92310103021) | 19


Figure 3.1.5: City Attribute
Description:
The City attribute is of nominal type with 4 distinct values: Rajkot, Ahmedabad, Jamnagar,
and Gondal. There are no missing values, and the dataset shows a clear majority from Rajkot
(5 instances). Ahmedabad follows with 2, while Jamnagar and Gondal have 1 each.The
histogram shows how backlog classes are distributed across cities. Rajkot has the highest
number of students and also contributes to both backlog and non-backlog cases. The visual
suggests some regional variation in backlog status, though the small dataset limits strong
conclusions.

1) Visualization for class:Backlog(Nom)


This window displays the distribution of all attributes in the dataset with respect to the class
variable Backlog. Each chart shows how instances are split between students with and
without a backlog (red and blue bars). Key observations:
• Age and Bdate histograms indicate a wider range, with certain ages or birth dates linked
more frequently to backlogs.
• Gender shows an almost equal split, with no major class imbalance.
• City clearly highlights Rajkot as the most represented city, which also has a higher count of
backlogs.
• Married status shows most students are not married, and among them, backlog variation
exists.
• Email, Address, and Name are marked as "neither numeric nor nominal", hence not
visualized.
• Backlog plot confirms the count split (4 vs. 5) in the dataset.

Komal Buddhdev (92310103021) | 20


Figure 3.1.6: Visualization for class:Backlog(Nom)

2) Visualization for class:Name(str)

Komal Buddhdev (92310103021) | 21


Figure 3.1.7: Visualization for class:Name(str)

Description:
This “Visualize All” panel in WEKA displays attribute distributions in black and white. Each
histogram shows the frequency of attribute values across all 9 instances.
• Age, Gender, Bdate, City, Married, Mobile, Backlog: Visualized with bar charts. For example,
most students are aged 20+, live in Rajkot, and are unmarried.
• Name, Email, Address: Not visualized – labeled as “neither numeric nor nominal”.
• Backlog: Final target attribute showing 4 students with no backlog and 5 with backlog.

• Analysis of “weather.nominal.arff” dataset

Dataset Code
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,no

Analysis of “weather.nominal.arff” with weka

1) Outlook Atrribute
This WEKA Explorer window shows the Preprocess tab for the dataset weather.symbolic,
which has 5 attributes and 14 instances.
• The selected attribute is play (target class).
• It is a nominal attribute with two values:
o yes (9 instances)
o no (5 instances)
• The bar chart below visualizes the class distribution, with more instances labeled yes.

Komal Buddhdev (92310103021) | 22


Figure 3.2.1: Outlook Attribute

2) Temperature Attribute:

Figure 3.2.2: Temperature Attribute

Komal Buddhdev (92310103021) | 23


Description:
This figure shows the distribution of the temperature attribute from the weather.symbolic
dataset in WEKA Explorer. The attribute is nominal with three distinct values: hot, mild, and
cool, each appearing 4 times. The bar chart at the bottom shows how these values relate to the
class play — with blue representing “yes” and red representing “no.” The mild temperature has
the highest number of “yes” outcomes. This view helps analyze the impact of temperature on the
decision to play.

3) Humidity Attribute:

Figure 3.2.3: Humidity Attribute

Description:
This figure displays the humidity attribute from the weather.symbolic dataset in WEKA
Explorer. The attribute is nominal with two distinct values: high and normal, each
occurring 7 times. The bar chart below shows how each humidity level relates to the
target class play (blue for "yes", red for "no"). The data is evenly split, indicating no strong
preference toward playing based on humidity alone, which may affect its usefulness in
classification.

4) Windy Attribute
Description:
This is the Preprocess tab of WEKA Explorer showing the dataset weather.symbolic with
5 attributes and 14 instances. The selected attribute is windy, which has two values:
TRUE (6 times) and FALSE (8 times). A bar chart below shows how the windy attribute
relates to the target class play. The left chart represents windy = TRUE, and the right
chart represents windy = FALSE. The top menu has options like Open file, Save, Edit, and
Generate for managing the dataset.
Komal Buddhdev (92310103021) | 24
Figure 3.2.4: Windy Attribute

5) Play Attribute:
Description:
This is the Preprocess tab of WEKA Explorer showing the dataset weather.symbolic with
5 attributes and 14 instances. The selected attribute is play, which has two values: yes (9
times) and no (5 times). The bar chart below shows the distribution: the blue bar
represents "yes" and the red bar represents "no". The dataset is ready for further
processing like classification or visualization.

Figure 3.2.5: Play Attribute


Komal Buddhdev (92310103021) | 25
1) Visualization for class Play(nom)

Figure 3.2.6: Visualization for class Play(nom)


Description:
This image shows the Visualize All Attributes view in WEKA Explorer. It displays bar
charts for all the attributes: outlook, temperature, humidity, windy, and play. Each chart
shows the distribution of attribute values and how they relate to the class labels, typically
shown in red and blue colors. The play attribute is the target class, with 9 instances of
'yes' (blue) and 5 instances of 'no' (red). This visualization helps to quickly understand
the data patterns and class distribution across all features.

2) Visualization for class:temperature(Nom)

Figure 3.2.7: Visualization for class:temperature(Nom)

Komal Buddhdev (92310103021) | 26


Description:
This is the Visualize All Attributes view in WEKA Explorer with an updated display
showing multi-colored bar charts for all attributes: outlook, temperature, humidity,
windy, and play. Each bar now has three color segments (blue, red, cyan) representing
different class combinations or attribute splits. The charts show the frequency
distribution of each attribute and their relation to the class labels. This multi-color
view provides a more detailed breakdown of how data instances are spread across
multiple categories.

• Analysis of “iris.arff” dataset

Dataset Code
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class
{Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
5.1 ,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa 5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa 5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa 4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa 4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa 4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa 5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa 5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa 5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa 5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa 5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa 5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2 ,3.5,1.5,0.2,Iris-setosa

Analysis of “iris.arff” with weka

1) Sepallength Attribute

Description:
This image shows the Preprocess tab of WEKA Explorer with the Iris dataset loaded. The
dataset contains 150 instances and 5 attributes: sepallength, sepalwidth, petallength,
petalwidth, and class. The class attribute is selected, which is a nominal type with three
distinct classes: Iris-setosa, Iris-versicolor, and Iris-virginica, each having 50 instances.
Komal Buddhdev (92310103021) | 27
The bar chart at the bottom visually represents the class distribution. The interface also
provides options to open files, apply filters, and manage attributes.

Figure 3.3.1: Sepallength Attribute

2) Sepalwidth Attribute

Figure 3.3.2: Sepalwidth Attribute

Komal Buddhdev (92310103021) | 28


Description:
This image shows the Preprocess tab of WEKA Explorer with the Iris dataset loaded and
the attribute sepalwidth selected. The dataset contains 150 instances and 5 attributes.
The sepalwidth attribute is numeric with 23 distinct values, ranging from 2 to 4.4, with a
mean of 3.054 and a standard deviation of 0.434. The histogram below shows the
distribution of sepalwidth across the three iris classes, each represented in different
colors, indicating how the values are spread among the classes.

3) Petallength Attribute

Figure 3.3.3: Petallength Attribute

Description:
This image shows the Preprocess tab of WEKA Explorer with the Iris dataset loaded and the
attribute petallength selected. The petallength attribute is numeric with 43 distinct values,
ranging from 1 to 6.9. The mean is 3.759 and the standard deviation is 1.764. The histogram at
the bottom displays the distribution of petal lengths across the three iris classes, with each class
shown in a different color, illustrating clear separation among the classes based on petal length.

4) Petalwidth Attribute

Description:
This image shows the Preprocess tab of WEKA Explorer with the Iris dataset loaded and the
attribute petalwidth selected. The petalwidth attribute is numeric with 22 distinct values,
ranging from 0.1 to 2.5. The mean value is 1.199 and the standard deviation is 0.763. The
histogram at the bottom represents the distribution of petal width across the three iris
classes, each displayed in different colors, indicating how the petal width varies between the
classes.

Komal Buddhdev (92310103021) | 29


Figure 3.3.4: Petalwidth Attribute

5) Class Attribute

Figure 3.3.5: Class Attribute

Description:
This image shows the Preprocess tab of WEKA Explorer with the Iris dataset loaded. All four
input attributes (sepallength, sepalwidth, petallength, petalwidth) and the class attribute
Komal Buddhdev (92310103021) | 30
are selected. The class attribute is nominal with three distinct classes: Iris-setosa,
Iris-versicolor, and Iris-virginica, each having 50 instances. The bar chart below shows that
the dataset is perfectly balanced, with an equal number of samples in each class,
represented by three colored bars.

1) Visualization for Class: class(Nom)

Figure 3.3.6: Visualization for Class: class(Nom)

Description:
This image shows the Visualize All Attributes window in WEKA Explorer for the Iris dataset.
It displays histograms for sepallength, sepalwidth, petallength, petalwidth, and class. Each
class is color-coded: blue, red, and cyan. Petal length and petal width show clear separation
between classes, while sepal attributes have more overlap. The class distribution is balanced
with 50 instances each.

2) Visualization for Class:sepallength(Num)


Description:
This image displays the "All Attributes" view in the WEKA Explorer for a dataset (likely the
Iris dataset). It shows numerical summaries for attributes such as sepallength, sepalwidth,
petallength, petalwidth, and a misspelled "dlass" (possibly intended to be "class").The values
listed (e.g., 24, 30, 16 for sepallength) likely represent frequency counts or bin ranges from
histograms.The petalwidth section includes decimal values (e.g., 8, 6, 7.9), suggesting
measurements rather than counts.

Komal Buddhdev (92310103021) | 31


Figure 3.3.6: Visualization for Class:sepallength(Num)
Experiment Outcome:

• Successfully loaded and explored the “student”, “weather.nominal”, and “iris” datasets using
WEKA.
• Performed data preprocessing such as handling missing values and attribute editing.
• Applied classification algorithms (e.g., J48, Naive Bayes) and interpreted results using
evaluation metrics.
• Analyzed patterns and relationships within the datasets using statistical summaries and
filters.
• Used visualization tools in WEKA to generate scatter plots, histograms, and decision trees for
better data understanding.
• Identified class distributions and feature importance, especially in the iris dataset through
classification and clustering.
• Gained hands-on experience in data analysis workflow, from loading datasets to applying
machine learning models and interpreting results.

Komal Buddhdev (92310103021) | 32


Experiment 4

Title: Apply Preprocessing techniques on dataset using filters: Remove,


ReplaceMissingValues,ReplaceMissingWithUserConstant,
ReplaceWithMissingValue, Descritize.
Also do the result analysis before and after preprocessing.

1) Remove - Filter Introduction, working & importance


The Remove filter in data preprocessing is a vital tool used to eliminate unwanted attributes
(columns) or instances (rows) from a dataset before analysis. It works by allowing the user to
specify which attributes to retain or discard, streamlining the dataset for better performance
and clarity. By reducing the dimensionality of the dataset, the Remove filter not only simplifies
the data but also enhances the efficiency and accuracy of algorithms, making it an essential
step in data cleaning and preparation.

- Dataset before filter

Fig 4.1: Remove Filter Interface in Weka Explorer Description:

This figure shows the "Remove" filter in Weka’s Preprocess panel, used to eliminate selected
attributes from a dataset. In the example, the attribute "duration" is selected for removal from
the "labor-neg-data" dataset. The right panel displays statistical details of the selected
attribute, such as minimum, maximum, mean, and standard deviation, along with a class
distribution histogram.
- Steps for applying the filter
1) Click "Open file" and load your dataset. 2) Click "Choose" under the Filter section. 3)
Select: Unsupervised → Attribute → Remove 4) Click the filter name (Remove) to set
options (e.g., select attribute indices to remove). 5) Click "Apply" to apply the filter.

Komal Buddhdev (92310103021) | 33


Dataset after applying filter

Description:
This screenshot shows the Remove filter in WEKA's Preprocess tab. The filter Remove -R 1 is
selected to remove the first attribute (wage-increase-first-year) from the dataset. Users can
Komal Buddhdev (92310103021) | 34
select attributes to remove using checkboxes or index range, and then click Apply to exclude
them from the dataset.

2) ReplaceMissingValues - Filter Introduction, working & importance


ReplaceMissingValues Filter is a preprocessing filter used to handle missing or null data in
datasets. It works by automatically filling in the missing values — for numerical attributes, it
uses the mean of the attribute; for nominal attributes, it uses the most frequent value (mode).
Importance: Improves data quality and consistency. Prevents errors during model training.
Enhances model accuracy by maintaining a complete dataset.

Dataset before filter

Fig. No: 4.2 — ReplaceMissingValues Filter in WEKA


Description:
This figure displays the ReplaceMissingValues filter applied in WEKA to handle missing data.
It replaces missing numeric values with the mean and nominal values with the mode,
ensuring the dataset is complete for further analysis. The selected attribute's statistics and
distribution are also shown.
- Steps for applying the filter
1) Click "Open file" and load your dataset. 2) Click "Choose" under the Filter section.
3) Select: Unsupervised → Attribute → ReplaceMissingValues 4) Click the filter name
(Remove) to set options (e.g., select attribute indices to remove). 5) Click "Apply" to apply the
filter.

Komal Buddhdev (92310103021) | 35


- Dataset after applying filter

Komal Buddhdev (92310103021) | 36


Description:
This figure shows the Discretize filter applied in WEKA, which converts numeric attributes
into nominal (categorical) ones by dividing them into intervals. The histogram displays the
distribution of the newly created nominal bins for the "duration" attribute.

3) ReplaceMissingWithUserConstant - Filter Introduction, working & importance

This filter replaces all missing values in a dataset with a user-defined constant.
Working: You choose a constant value (like "0" or "unknown"), and the filter fills in all missing
numeric or nominal values with it. Importance: It ensures data completeness using a specific
value chosen by the user.

- Dataset before filter

Fig. No: 4. 3 — ReplaceMissingValues Filter in WEKA


This figure shows the ReplaceMissingwithUserConstant filter used in WEKA to automatically
fill missing values. It replaces numeric missing values with the mean and nominal ones with the
mode, improving data quality for analysis.
- Steps for applying the filter
1) Click "Open file" and load your dataset. 2) Click "Choose" under the Filter section. 3)
Select: Unsupervised → Attribute → ReplaceMissingWithUserConstant 4) Click the filter
name (Remove) to set options (e.g., select attribute indices to remove). 5) Click "Apply" to
apply the filter.

Komal Buddhdev (92310103021) | 37


- Dataset after applying filter

Komal Buddhdev (92310103021) | 38


Description:
This figure shows the ReplaceMissingWithUserConstant filter in WEKA, where missing values
in the "duration" attribute are replaced with a user-defined constant. This allows consistent
handling of missing data using a specific value set by the user.

4) ReplaceWithMissingValue - Filter Introduction, working & importance

This filter in WEKA replaces selected attribute values with missing (unknown) values. You
specify which attribute(s) and values to convert, and the filter marks them as missing. It is
useful for simulating missing data or correcting wrongly filled values during preprocessing.

- Dataset before filter

Fig. No 4.4 : ReplaceWithMissingValue Filter in WEKA


Description:
This figure shows the ReplaceWithMissingValue filter applied to the "wage-increase-third-
year" attribute. Specific values have been converted into missing values, useful for simulating
or correcting data during preprocessing.
Komal Buddhdev (92310103021) | 39
- Steps for applying the filter
1) Click "Open file" and load your dataset.
2) 2) Click "Choose" under the Filter section.
3) Select: Unsupervised → Attribute → ReplaceWithMissingValue
4) 4) Click the filter name (Remove) to set options (e.g., select attribute indices
to remove).
5) Click "Apply" to apply the filter.

Komal Buddhdev (92310103021) | 40


- Dataset after applying filter

Description:
This figure shows the ReplaceWithMissingValue filter applied to the "wage-increase-third-
year" attribute, where 88% of values have been replaced with missing values. It demonstrates
the effect of increasing the percentage of data made missing, useful for testing data imputation
methods or simulating incomplete datasets.

3) Descritize
- Filter Introduction, working & importance
The Discretize filter in WEKA is used to convert numeric attributes into nominal (categorical)
ones by dividing their range into fixed intervals or bins. It works by specifying the number of
bins or using supervised methods to group values based on class labels. This is important when
algorithms require categorical input or when simplifying continuous data helps in better
pattern recognition and interpretation.
- Dataset before filter

Komal Buddhdev (92310103021) | 41


Description:
This figure shows the Discretize filter applied to the "duration" attribute. It converts the
numeric data into nominal bins, allowing categorical analysis and compatibility with
algorithms requiring nominal inputs.
- Steps for applying the filter
1) Click "Open file" and load your dataset. 2) Click "Choose" under the Filter section. 3)
Select: Unsupervised → Attribute → Descritize 4) Click the filter name (Remove) to set
options (e.g., select attribute indices to remove). 5) Click "Apply" to apply the filter.

Komal Buddhdev (92310103021) | 42


- Dataset after applying filter

Description:
This figure displays the result of the Discretize filter applied to the "duration" attribute. The
numeric values are converted into defined intervals (bins), making the attribute nominal for
categorical analysis and algorithm compatibility.

Experiment Outcome:
The experiment successfully demonstrated the application of various preprocessing filters in
WEKA, including ReplaceMissingValues, ReplaceMissingWithUserConstant,
ReplaceWithMissingValue, and Discretize. Each filter was applied to clean, modify, or
transform the dataset attributes. The outcomes showed improved data quality,
preparation of missing values, and conversion of numeric data into categorical form.
These steps are essential for enhancing the accuracy and effectiveness of machine learning
models by ensuring the dataset is consistent, complete, and algorithm-ready.

Komal Buddhdev (92310103021) | 43


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual

Experiment 5

Title: Apply Preprocessing techniques on dataset using filters:


NumericToNominal, StringToNominal, NominalToBinary, Normalize. Also do the
result analysis before and after preprocessing.
Filter 1: NumericToNominal
The NumericToNominal filter in Weka is used to convert one or more numeric attributes in a dataset
into nominal (categorical) attributes. This is useful when a numeric attribute actually represents
categories (like codes for classes or labels), and should be treated as such during data mining or machine
learning processes.

Dataset: iris.arff

Step-1: Upload dataset in Weka.

The iris.arff dataset is a well-known and widely used dataset in machine learning and pattern
recognition. It consists of 150 instances, each representing a sample of an iris flower. The dataset
includes five attributes: sepallength, sepalwidth, petallength, petalwidth, and class. The first four
attributes are numeric and represent the physical dimensions of the flower's sepals and petals in
centimeters. The fifth attribute, class, is nominal and indicates the species of the iris flower, which
can be one of three categories: Iris-setosa, Iris-versicolor, or Iris-virginica.
Step-2: dataset in table format.
Komal Buddhdev (92310103021) 44
FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual

The screenshot shows the tabular view of the iris.arff dataset in Weka. It contains five attributes:
sepallength, sepalwidth, petallength, petalwidth (all numeric), and class (nominal), which indicates the
iris flower species such as Iris-setosa. Each row represents one flower instance with its measured values.

Step-3: Configure the parameters of NumericToNominal filter.

This image shows the Weka Explorer interface applying the NumericToNominal filter to convert
numeric attributes (1–3) in the Iris dataset to nominal. A histogram and attribute statistics are also
visible.

Komal Buddhdev (92310103021) 45


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual
Step-4: Apply NumericToNominal filter from 1 to 3 attribute.

This image shows Weka Explorer after applying the NumericToNominal filter to attributes 1–3 of
the Iris dataset. The sepallength attribute is now treated as nominal, with distinct value counts and
a class distribution histogram displayed.

Step-5: Dataset in table format after applying NumericToNominal filter .

Komal Buddhdev (92310103021) 46


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual
This screenshot shows the table format of dataset after applying NumericToNominal filter.
Filter 2: StringToNominal

The StringToNominal filter in Weka is used to convert string attributes into nominal attributes. It is
particularly useful when the dataset contains categorical data represented as text (strings) that needs to be
transformed into discrete values (nominal). This conversion helps in applying machine learning algorithms
that require nominal data as input.

Dataset: contact-lenses.arff

Step-1: Upload dataset in Weka.

This image shows the Preprocess tab of the Weka Explorer interface, where the "contact-lenses"
dataset is currently loaded. The dataset contains 24 instances and 5 attributes: age, spectacle-
prescrip, astigmatism, tear-prod-rate, and contact-lenses. The selected attribute in this view is
"age," which is a nominal attribute with three distinct values: young, pre-presbyopic, and
presbyopic. Each of these age categories contains an equal count of 8 instances, indicating a
balanced distribution across the dataset.
On the right side, a detailed summary of the selected attribute is displayed, showing the label
names, counts, and weights. Below that, a bar chart visualizes how the values of the class attribute
"contact-lenses" are distributed across each age category. Each color in the bars represents a
different class label (such as "no lenses," "soft," or "hard" lenses). The visualization allows users to
observe how the lens recommendations vary based on age groups, providing insights into the
relationship between age and lens type. This setup is part of the preprocessing phase in Weka, often
used to explore and understand the structure of the dataset before applying machine learning
algorithms.
Komal Buddhdev (92310103021) 47
FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual
Step-2: dataset in table format.

This image displays the data viewer in Weka for the "contact-lenses" dataset. It shows 24 instances with 5
nominal attributes: age, spectacle prescription, astigmatism, tear production rate, and contact lens
recommendation. The table provides a clear view of how different attribute combinations influence the
contact lens type prescribed (none, soft, or hard).

Step-3: Configure the parameters of StringToNominal filter.

Komal Buddhdev (92310103021) 48


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual
This image shows Weka Explorer with the StringToNominal filter selected. The filter is set to
convert string attributes in the range 1–5 to nominal. A pop-up window displays filter settings,
including attribute range and debugging options.
Step-4: Apply StringToNominal filter from 1 to 5 attribute.

This image shows the Weka Explorer after applying the StringToNominal filter to the contact-
lenses dataset. All five attributes (age, spectacle-prescrip, astigmatism, tear-prod-rate, and contact-
lenses) have been successfully converted to nominal type, enabling categorical data analysis. The
visualization panel displays class distribution for the contact-lenses attribute across the three age
groups (young, pre-presbyopic, and presbyopic), each with equal instance counts.

Step-5: Dataset in table format after applying StringToNominal filter .

Komal Buddhdev (92310103021) 49


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual

This screenshot shows the table format of dataset after applying StringToNominal filter.
Filter 3: NominalToBinary
The NominalToBinary filter in Weka is an unsupervised attribute filter that transforms nominal
(categorical) attributes into binary (numeric) form. This process, known as one-hot encoding, creates a
separate binary attribute for each possible value of a nominal attribute. For example, if an attribute "Color"
has values like Red, Green, and Blue, the filter converts it into three new binary attributes: "Color=Red",
"Color=Green", and "Color=Blue", with values of 0 or 1 indicating the presence of each category. This
conversion is particularly useful for machine learning algorithms in Weka that require numerical input
rather than categorical data.

Dataset: contact-lenses.arff

Step-1: Upload dataset in Weka.

Komal Buddhdev (92310103021) 50


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual

This image shows the Weka Explorer – Preprocess tab with the contact-lenses dataset loaded. It
contains 5 nominal attributes and 24 instances. The attribute "age" is selected, showing three
distinct values: young, pre-presbyopic, and presbyopic, each with 8 instances. A bar chart displays the
distribution of the target class contact-lenses across the different age groups.
Step-2: dataset in table format.

Komal Buddhdev (92310103021) 51


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual

The screenshot shows the tabular view of the contact-lenses.arff dataset in Weka. It contains five
attributes: age, spectacle-prescrip, astigmatism, tear-prod-rate and contact-lense. All the attributes
having type of string.

Step-3: Configure the parameters of NominalToBinary filter.

This image shows the Weka Explorer interface with the NominalToBinary filter selected. The
filter is configured to convert nominal attributes in indices 1 to 3 into binary numeric attributes.

Komal Buddhdev (92310103021) 52


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual
The filter options window is open, allowing customization of parameters before applying the
transformation to the dataset.
Step-4: Apply NominalToBinary filter from 1 to 3 attribute.

This image shows that first three attributes values are replaced by t(true) or f(false).

Step-5: Dataset in table format after applying NominalToBinary filter .

This screenshot shows the table format of dataset after applying NominalToBinary filter.
Komal Buddhdev (92310103021) 53
FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual
Filter 4: Normalize
The Normalize filter in Weka is a data preprocessing tool that scales numeric attribute values to a specified
range, typically [0, 1]. This is useful for improving the performance of machine learning algorithms that are
sensitive to the scale of input data, such as k-NN or neural networks. It ensures that all numeric attributes
contribute equally to the model.

Dataset: student.arff

Step-1: Upload dataset in Weka.

This image shows the Preprocess tab of the Weka Explorer with the student dataset loaded. It
contains 25 instances and 23 attributes. The selected attribute is "id", which is numeric with
distinct values ranging from 1 to 25. At the bottom, a class distribution histogram for the attribute
"final_result" (a nominal class) is displayed in red and blue, indicating the frequency of each class
value.
Step-2: dataset in table format.

Komal Buddhdev (92310103021) 54


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual

This image shows the data viewer window in Weka, displaying the student dataset with 23
attributes, including both numeric and nominal types such as id, name, age, gender, gpa, and
participation. Each row represents a student instance with corresponding values.

Step-3: Configure the parameters of Normalize filter.

This image shows the Normalize filter settings in Weka, where numeric attributes are scaled
between 0.0 and 1.0 using the weka.filters.unsupervised.attribute.Normalize filter.
Step-4: Apply Normalize filter from to all attributes.

Komal Buddhdev (92310103021) 55


FACULTY OF ENGINEERING & TECHNOLOGY
Department of Computer Engineering
01CE0723 – DWDM – Lab Manual

This screenshot shows the Weka Explorer after applying the Normalize filter, where all numeric
attributes have been scaled to the [0, 1] range.

Step-5: Dataset in table format after applying Normalize filter .

This screenshot shows the table format of dataset after applying Normalize filter. where all numeric
attributes have been scaled to the [0, 1] range.

Komal Buddhdev (92310103021) 56

You might also like