0% found this document useful (0 votes)

53 views14 pages

Data Mining Techniques for Property Data

this is the assignment file

Uploaded by

arun neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views14 pages

Data Mining Techniques for Property Data

this is the assignment file

Uploaded by

arun neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

lOMoARcPSD|12245914

Assignment 3

Introduction to Data Analytics (University of Technology Sydney)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by arun neupane (arunneupane20@[Link])
lOMoARcPSD|12245914

Introduction to Data Analytics

Assessment Task 3: Data mining in action

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Table of Contents

Data Mining ....................................................................................................................... 4

The Task ........................................................................................................................ 4

Input................................................................................................................................ 4

Output............................................................................................................................. 4

Preprocessing ................................................................................................................... 5

Column Filter ................................................................................................................. 5

Missing Value ................................................................................................................ 5

Number to String .......................................................................................................... 5

Normalizer ..................................................................................................................... 5

Partitioning..................................................................................................................... 5

Classifiers .......................................................................................................................... 6

Decision Trees .............................................................................................................. 6

Random Forest ............................................................................................................. 7

K Nearest Neighbor (KNN) ......................................................................................... 8

SVM ................................................................................................................................ 9

Neural Networks ......................................................................................................... 10

Tree Ensemble.............................................................................................................11

Best Classifier ................................................................................................................. 12

Result Summary ......................................................................................................... 12

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Conclusion ................................................................................................................... 12

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Data Mining

The Task

Following the last assignment, building classifiers and choosing the best one to predict

an attribute “QUALIFIED” for property data set is the main focus of this assignment.

There are number of methods for it. The software called KNIME, which has a graphical

interface, is chosen for it to explicate the process visually.

Input

There are three files for this assignment. These are training data set, unknown data set,

and sample prediction data set. The training data set has the attribute “QUALIFIED”, but

unknown data set has not. The last data set, sample prediction, is filled with random

values for how Kaggle works.

For the assignment, KNIME will handle the training and unknown data sets to predict

the attribute value for unknown data set.

Output

It is not mandatory, but once predicted data is created, uploading on Kaggle will score it

and known how effective the process is.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Preprocessing

Column Filter

Within the data set, attribute “GIS_LAST_MOD_DTTM” which is a column number 37

has same value for all rows. Therefore, a column filter is used to remove the column

from the data set to ignore it.

Missing Value

Missing values which may disturb the prediction are will be removed.

Number to String

There are attributes which have numbers as data, but not numeric data such as “HEAT”,

“STYLE”, “STRUCT”, “GRADE”, “CNDTN”, “EXTWALL”, “ROOF”, “INTWALL”,

“USECODE”. There will be treated as string to improve learner’s performance.

Normalizer

The normalizer normalizes attribute “AYB” with min-max normalization.

Partitioning

The partitioning node separates the training data into two portions, split 70-30 with 70%

will be trained, and 30% will be tested.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Classifiers

Decision Trees

The data will be transformed and predicted by decision tree nodes. It is most

appropriate to construct categorical data. The accuracy is 83.074%. There are 1544

wrong classified rows.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Random Forest

The pre-processed data transmitted into Random Forest learner, and default settings

are used. The accuracy is 88.043%.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

K Nearest Neighbor (KNN)

The preprocessed data transmitted into the KNN node. The “Number of Neighbors to

consider (K) was changed to 5 which was originally 3. The accuracy is 85.855%

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

SVM

After starting the SVM learner over 24 hours, it did not complete the process; thus, no

results came out.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Neural Networks

The pre-processed data was transmitted into the PNN Learner. The settings are default.

The accuracy is 87.01%.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Tree Ensemble

The pre-processed data transmitted into the Tree Ensemble Learner with default

settings except the partitioning, which is 90-10. The accuracy is 88.662%.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

Best Classifier

Result Summary

The result of each method is the following:

Decision Tree: 83.074%

Random Forest: 88.043%

K Nearest Neighbor: 85.855%

SVM:

Neural Networks: 87.01%.

Tree Ensemble: 88.662%.

Conclusion

Based on the result summary above, Tree Ensemble has the highest accuracy among

others. Thus, for unknown data set, Tree Ensemble methods will be used for making a

prediction. The prediction from unknown data set was uploaded on Kaggle.

Downloaded by arun neupane (arunneupane20@[Link])

lOMoARcPSD|12245914

The whole part of KNIME workflows:

Downloaded by arun neupane (arunneupane20@[Link])

PRNN Assignment 1: ML Models Overview
No ratings yet
PRNN Assignment 1: ML Models Overview
3 pages
Group B: Machine Learning
No ratings yet
Group B: Machine Learning
25 pages
Heart Merged
No ratings yet
Heart Merged
8 pages
MBAN Assignment
No ratings yet
MBAN Assignment
2 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Minimalist Business Slides XL by Slidesgo
No ratings yet
Minimalist Business Slides XL by Slidesgo
27 pages
Presentation 1 - 2
No ratings yet
Presentation 1 - 2
26 pages
Scikit-Learn Classification Cheat Sheet
No ratings yet
Scikit-Learn Classification Cheat Sheet
1 page
05 K-Nearest Neighbors
No ratings yet
05 K-Nearest Neighbors
15 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
FAQ's - Supervised Learning
No ratings yet
FAQ's - Supervised Learning
4 pages
Data Science & ML Course Overview
No ratings yet
Data Science & ML Course Overview
40 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
16 pages
Intro to Machine Learning for Data Science
No ratings yet
Intro to Machine Learning for Data Science
37 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
No ratings yet
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
4 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Assignment 1.1: First 10 Rows Looks Like Below in Notepad++
100% (1)
Assignment 1.1: First 10 Rows Looks Like Below in Notepad++
6 pages
MLT Syllabus
No ratings yet
MLT Syllabus
3 pages
Btech III Year I Semester (Ar20)
No ratings yet
Btech III Year I Semester (Ar20)
7 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
Machine Learning Assignments 2018-19
No ratings yet
Machine Learning Assignments 2018-19
3 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
Final
No ratings yet
Final
13 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
No ratings yet
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
3 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
Scikit-Learn Python Cheat Sheet
No ratings yet
Scikit-Learn Python Cheat Sheet
1 page
Scikit-Learn Algorithm Overview
No ratings yet
Scikit-Learn Algorithm Overview
1 page
KNN Classifier with Car Data
No ratings yet
KNN Classifier with Car Data
2 pages
Python Data Science Cheat Sheet
100% (1)
Python Data Science Cheat Sheet
1 page
PRL Report 1
No ratings yet
PRL Report 1
9 pages
Data Analysis and Machine Learning Lab Questions
No ratings yet
Data Analysis and Machine Learning Lab Questions
9 pages
Classification
No ratings yet
Classification
4 pages
KNN and Logistic Regression Guide
No ratings yet
KNN and Logistic Regression Guide
18 pages
Manual (2023 CS 156)
No ratings yet
Manual (2023 CS 156)
26 pages
Machine Learning Model Paper
No ratings yet
Machine Learning Model Paper
3 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
ML FA24 Final Term Exam (Solution)
No ratings yet
ML FA24 Final Term Exam (Solution)
19 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Midterm - APS1070 - 2019 - 09 Fall
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
2 pages
Minor Project
No ratings yet
Minor Project
21 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Aman 61
No ratings yet
Aman 61
24 pages
KNN Practice Set
No ratings yet
KNN Practice Set
5 pages
AAIC Syllabus
No ratings yet
AAIC Syllabus
19 pages
Task 1 Process Diagram
No ratings yet
Task 1 Process Diagram
2 pages
ICT309 - Assessment 2
No ratings yet
ICT309 - Assessment 2
3 pages
Np000333 Post Internship Report
No ratings yet
Np000333 Post Internship Report
49 pages
CT106 3 2 SNA SNA LBEF Exam 1
50% (2)
CT106 3 2 SNA SNA LBEF Exam 1
2 pages
SAP Infotype
No ratings yet
SAP Infotype
3 pages
Marking - Scheme of Assignment
No ratings yet
Marking - Scheme of Assignment
1 page
Warehouse Management Essentials
No ratings yet
Warehouse Management Essentials
2 pages
CMIS 320 Group Assignment Guidelines
No ratings yet
CMIS 320 Group Assignment Guidelines
1 page
Section A: Operating System and Computer Architecture 1 of 7
No ratings yet
Section A: Operating System and Computer Architecture 1 of 7
7 pages
Networking Quiz: OSI Model & Protocols
No ratings yet
Networking Quiz: OSI Model & Protocols
9 pages
A Network Administrator Is Troubleshooting Connectivity Issues On A Server
No ratings yet
A Network Administrator Is Troubleshooting Connectivity Issues On A Server
2 pages
Kathrein 80010681
100% (1)
Kathrein 80010681
2 pages
Tactical Medicine Essentials 2nd Edition John E. Campbell Full Access
No ratings yet
Tactical Medicine Essentials 2nd Edition John E. Campbell Full Access
164 pages
Lecture 1 B
No ratings yet
Lecture 1 B
49 pages
G1200 Digital Microscope Instructions
No ratings yet
G1200 Digital Microscope Instructions
6 pages
Saipan Campaign: A Crucial Battle
No ratings yet
Saipan Campaign: A Crucial Battle
30 pages
Adidas SWOT Analysis and Marketing Strategies
No ratings yet
Adidas SWOT Analysis and Marketing Strategies
15 pages
Laura N. Lowes, Dawn E. Lehman, and Carson Baker
No ratings yet
Laura N. Lowes, Dawn E. Lehman, and Carson Baker
23 pages
Heightmaxx
No ratings yet
Heightmaxx
12 pages
Randstad Employer Brand Research - Global Report 2025
No ratings yet
Randstad Employer Brand Research - Global Report 2025
40 pages
Introduction To Financial Reporting
No ratings yet
Introduction To Financial Reporting
40 pages
Still Feel Unwell After Helicobacter Pylori Treatment Page 8 Duodenal Ulcer Forums
No ratings yet
Still Feel Unwell After Helicobacter Pylori Treatment Page 8 Duodenal Ulcer Forums
3 pages
Clipping Techniques in 3D Graphics
No ratings yet
Clipping Techniques in 3D Graphics
59 pages
Hospital - Data Collection & Literature Study
100% (1)
Hospital - Data Collection & Literature Study
42 pages
CRM-M 22363 2025 02 05 2025 Final Order
No ratings yet
CRM-M 22363 2025 02 05 2025 Final Order
4 pages
Template For Financial Projection
No ratings yet
Template For Financial Projection
32 pages
ENG-STAG-4 (PS-02) - Connection
No ratings yet
ENG-STAG-4 (PS-02) - Connection
1 page
Chapter 9 Strategies in The Job-Search
No ratings yet
Chapter 9 Strategies in The Job-Search
46 pages
Praveen Kumar Policing The Police Political Patronage
No ratings yet
Praveen Kumar Policing The Police Political Patronage
11 pages
ERP Business Requirements Guide
No ratings yet
ERP Business Requirements Guide
2 pages
Long-Term Care Support for Filipino Caregivers
No ratings yet
Long-Term Care Support for Filipino Caregivers
27 pages
Production & Operations Management Guide
No ratings yet
Production & Operations Management Guide
15 pages
Engineering Measurements - Methods and Intrinsic Errors - WILLEY PDF
No ratings yet
Engineering Measurements - Methods and Intrinsic Errors - WILLEY PDF
195 pages
Pengenalan Python: Mohammad Syarief
No ratings yet
Pengenalan Python: Mohammad Syarief
9 pages
Tyler Woessner Resume
No ratings yet
Tyler Woessner Resume
2 pages
Kaeser Screw Compressors DSD Series
50% (2)
Kaeser Screw Compressors DSD Series
8 pages
Final Ed
No ratings yet
Final Ed
63 pages
KEY. 4 Bài Nghe
No ratings yet
KEY. 4 Bài Nghe
3 pages
Feature of The Constitution of Bangladesh
86% (14)
Feature of The Constitution of Bangladesh
2 pages
Resume of Sunder Rajan L
No ratings yet
Resume of Sunder Rajan L
3 pages
Socialism-1964-1985-11953500: Ebook or Textbook
100% (12)
Socialism-1964-1985-11953500: Ebook or Textbook
48 pages