0% found this document useful (0 votes)

77 views2 pages

Final

This document provides instructions for Project II, which involves applying classification models to real spam data. Students are asked to: 1) Divide the data into training and test sets. 2) Apply four classification methods - logistic regression with lasso, single decision tree, random forests, and boosting - to the training set. Evaluate the performance of each model on the test set by calculating misclassification error, c-statistic, and plotting the ROC curve. Students must include R code in an appendix and interpret the results of the analysis. The overall goal is to gain experience applying different predictive modeling tools to a real-world classification problem.

Uploaded by

Aabizer Plumber

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views2 pages

Final

Uploaded by

Aabizer Plumber

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NST 772 Data Mining and Statistical Learning I

Project II
(Due on 12/12, Monday) Instructions: While discussions with classmates are allowed and encouraged, please try to work on the project independently and direct your questions to me. Use copy-and-paste to edit the R output. Only include necessary R results in your nal report. Do not forget to include R programming codes in the appendix. Also, interpretation of the analysis results is required.

Data

In this project, you are asked to apply several classication tools to a real problem. Lets consider the spam data available at [Link] This data set has been used in the textbook for illustration. The description is given at [Link] . And more info on this data set available at the UCI spambase directory: [Link] However, you may use your own dataset. In that case, you need to describe the settings and data clearly and make sure that the data set you choose is appropriate for the classication problems. For example, try to avoid dependent data such as time series or repeated measures collected from clustered or longitudinal studies. Check with me if you are unsure about its appropriateness. The following websites contain rich data from a variety of application elds. Statlib: [Link] UCI Machine Learning Repository: [Link] KD Nuggets: [Link] You may also use other datasets from the textbook web site (HTF, 2009): [Link] [Link]/~tibs/ElemStatLearn/.

Analysis
1. Divide your data sets into two sets: the learning set and the test set using the same indicator as HTF (2009): [Link] You might want to use v-fold cross validation if the data set you select is moderately sized. 1

Follow the steps below to conduct the analysis:

2. Try out the following predictive modeling tools. For each method, use the training set to identify the best model and apply the model to the test set. Then compute the misclassication error rate with cuto point 0.50, the c statistic, and plot its ROC curve, all based on the test set performance. It would be best, but not required, to have the ROC curves plotted on one gure and compare. Logistic regression using lasso; One Single Decision Tree; Random Forests; Boosting. Since each method involves numerous parameters to tune, make sure that important details in the model tting are clearly explained in your report.

CT2 Assignment
No ratings yet
CT2 Assignment
3 pages
CT2 Assignment
No ratings yet
CT2 Assignment
3 pages
Group B: Machine Learning
No ratings yet
Group B: Machine Learning
25 pages
DWDM Pavan Final
No ratings yet
DWDM Pavan Final
10 pages
Iml Lab (1) .177
No ratings yet
Iml Lab (1) .177
32 pages
Classification Models
No ratings yet
Classification Models
3 pages
Machine Learning Models for Transport Prediction
No ratings yet
Machine Learning Models for Transport Prediction
54 pages
B. Data Pre-Processing: Table I Class Distribution of Unsw-Nb15
No ratings yet
B. Data Pre-Processing: Table I Class Distribution of Unsw-Nb15
1 page
Machine Learning for Intrusion Detection
No ratings yet
Machine Learning for Intrusion Detection
15 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Data Mining - Lab 2
No ratings yet
Data Mining - Lab 2
5 pages
Homework 6
No ratings yet
Homework 6
2 pages
Machine Learning Laboratory Report
No ratings yet
Machine Learning Laboratory Report
23 pages
ML Lab Report for ECE Students
No ratings yet
ML Lab Report for ECE Students
38 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
1 page
DM LabManual Teena
No ratings yet
DM LabManual Teena
6 pages
AIML Hard
No ratings yet
AIML Hard
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
23ca22p1 - Machine Learning Lab
No ratings yet
23ca22p1 - Machine Learning Lab
2 pages
Machine Learning Laboratory Practical Record
No ratings yet
Machine Learning Laboratory Practical Record
27 pages
ML With Python Practical
No ratings yet
ML With Python Practical
22 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Malicious Event Detection ML Report
No ratings yet
Malicious Event Detection ML Report
3 pages
Admission Prediction Model Study
No ratings yet
Admission Prediction Model Study
4 pages
Spam Detection
No ratings yet
Spam Detection
39 pages
Phishing URL Detection with ML Techniques
No ratings yet
Phishing URL Detection with ML Techniques
24 pages
r20 Datamining Lab (2-2 Sem Lab)
No ratings yet
r20 Datamining Lab (2-2 Sem Lab)
41 pages
ML Lab Syllabus for Students
No ratings yet
ML Lab Syllabus for Students
90 pages
Jadavpur University: Assignment Submission
No ratings yet
Jadavpur University: Assignment Submission
9 pages
Optimized Classification of Forest Covertype
No ratings yet
Optimized Classification of Forest Covertype
16 pages
CE802 Pilot
No ratings yet
CE802 Pilot
2 pages
Rlab
No ratings yet
Rlab
7 pages
Python Machine Learning Basics
No ratings yet
Python Machine Learning Basics
37 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
Phishing Detection with ML Capstone
No ratings yet
Phishing Detection with ML Capstone
29 pages
Machine Learning Algo
No ratings yet
Machine Learning Algo
10 pages
CSE5ML 2024 SEM2 Assignment 1
No ratings yet
CSE5ML 2024 SEM2 Assignment 1
6 pages
IOM 530: Applied Modern Statistical Learning Methods Final Group Project Fall 2013
No ratings yet
IOM 530: Applied Modern Statistical Learning Methods Final Group Project Fall 2013
2 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
Comparison of Learning Techniques For Prediction of Customer Churn in Telecommunication
No ratings yet
Comparison of Learning Techniques For Prediction of Customer Churn in Telecommunication
36 pages
Lab Assessment 2 - Question
No ratings yet
Lab Assessment 2 - Question
2 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
3 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Assignment
No ratings yet
Assignment
5 pages
Intrusion Detection System With Ensemble Machine Learning Approaches Using VotingClassifier
No ratings yet
Intrusion Detection System With Ensemble Machine Learning Approaches Using VotingClassifier
4 pages
Rain Prediction Using Random Forest
No ratings yet
Rain Prediction Using Random Forest
30 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Phishing Detection Capstone
No ratings yet
Phishing Detection Capstone
19 pages
Machine Learning Algorithms 1728923216
No ratings yet
Machine Learning Algorithms 1728923216
12 pages
Project 2
No ratings yet
Project 2
2 pages
Article - 10 Machine Learning Algorithms in R
No ratings yet
Article - 10 Machine Learning Algorithms in R
2 pages
CTI Record
No ratings yet
CTI Record
49 pages
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
11 pages
Preview: Comparison of Machine Learning Algorithms and Their Ensembles For Botnet Detection
100% (2)
Preview: Comparison of Machine Learning Algorithms and Their Ensembles For Botnet Detection
11 pages
DTS 101 Lecture 6
No ratings yet
DTS 101 Lecture 6
18 pages
Tushar ML
No ratings yet
Tushar ML
52 pages
11 Scribd
No ratings yet
11 Scribd
1 page
NHLBI Protocol Template Overview
No ratings yet
NHLBI Protocol Template Overview
1 page
A Data Handling and Record Keeping
No ratings yet
A Data Handling and Record Keeping
1 page
NHLBI Study Monitoring Protocol Template
No ratings yet
NHLBI Study Monitoring Protocol Template
1 page
NHLBI Protocol Overview by Sunil Dravida
No ratings yet
NHLBI Protocol Overview by Sunil Dravida
1 page
8 Scribd
No ratings yet
8 Scribd
1 page
A Study Objectives: A.1 Primary Aim A.2 Secondary Aim A.3 Rationale For The Selection of Outcome Measures
No ratings yet
A Study Objectives: A.1 Primary Aim A.2 Secondary Aim A.3 Rationale For The Selection of Outcome Measures
1 page
A Study Design: A.1 Overview or Design Summary A.2 Subject Selection and Withdrawal
No ratings yet
A Study Design: A.1 Overview or Design Summary A.2 Subject Selection and Withdrawal
2 pages
A Background: A.1 Prior Literature and Studies A.2 Rationale For This Study
No ratings yet
A Background: A.1 Prior Literature and Studies A.2 Rationale For This Study
1 page
A Introduction: A.1 Study Abstract A.2 Primary Hypothesis A.3 Purpose of The Study Protocol
No ratings yet
A Introduction: A.1 Study Abstract A.2 Primary Hypothesis A.3 Purpose of The Study Protocol
1 page
Obstet Gynecol, 59 (3), 285-291.: References
No ratings yet
Obstet Gynecol, 59 (3), 285-291.: References
1 page
Financial Incentives To Ban Smoking-SCRIBD
No ratings yet
Financial Incentives To Ban Smoking-SCRIBD
14 pages
Databricks Certified Professional Data Engineer Jun 2024
No ratings yet
Databricks Certified Professional Data Engineer Jun 2024
21 pages
Types of Data Communication Codes
No ratings yet
Types of Data Communication Codes
12 pages
Programming Big Data Applications Scalable Tools and Frameworks
No ratings yet
Programming Big Data Applications Scalable Tools and Frameworks
296 pages
C++ Array Practice for Beginners
No ratings yet
C++ Array Practice for Beginners
9 pages
SQL Server 2008 Mirroring Setup Guide
No ratings yet
SQL Server 2008 Mirroring Setup Guide
10 pages
BSBITU301 - Assessment 1 - Knowledge Question
100% (1)
BSBITU301 - Assessment 1 - Knowledge Question
4 pages
Attunity Data Replication Expertise
No ratings yet
Attunity Data Replication Expertise
4 pages
Neo4j Graph Analytics
No ratings yet
Neo4j Graph Analytics
20 pages
Data Modeling and Database Design MCQ PDF
No ratings yet
Data Modeling and Database Design MCQ PDF
17 pages
5.data Warehousing Interview Questions
No ratings yet
5.data Warehousing Interview Questions
4 pages
DWDM 1st Mid R2031053
No ratings yet
DWDM 1st Mid R2031053
7 pages
Sensibo API Guide for Developers
No ratings yet
Sensibo API Guide for Developers
9 pages
Chapter 1 MIS
No ratings yet
Chapter 1 MIS
12 pages
DBMS Essentials for MIS Students
No ratings yet
DBMS Essentials for MIS Students
2 pages
2 Thèses Au Laboratoire CRISMAT
No ratings yet
2 Thèses Au Laboratoire CRISMAT
4 pages
Top 50 DB2 Interview Questions & Answers
No ratings yet
Top 50 DB2 Interview Questions & Answers
8 pages
Trace and Perf
No ratings yet
Trace and Perf
33 pages
Chapter 8 - Arrays
No ratings yet
Chapter 8 - Arrays
18 pages
Guidlines For Synopsis
No ratings yet
Guidlines For Synopsis
5 pages
New Features Guide: Informatica (Version 9.1.0)
No ratings yet
New Features Guide: Informatica (Version 9.1.0)
18 pages
Research Proposal Assessment Rubric
100% (2)
Research Proposal Assessment Rubric
2 pages
Data Dictionary Quick Reference Guide
No ratings yet
Data Dictionary Quick Reference Guide
2 pages
Artificial Intelligence and Journalism Practice in Nigeria - Perception of Journalists in Benin City Edo State
No ratings yet
Artificial Intelligence and Journalism Practice in Nigeria - Perception of Journalists in Benin City Edo State
19 pages
Nandu Power Bi
No ratings yet
Nandu Power Bi
2 pages
Identifying Research Problems in Studies
No ratings yet
Identifying Research Problems in Studies
4 pages
Practical File
No ratings yet
Practical File
17 pages
Overview of GPRS Tunnelling Protocol
No ratings yet
Overview of GPRS Tunnelling Protocol
9 pages
Informatica ETL and Data Integration Guide
No ratings yet
Informatica ETL and Data Integration Guide
7 pages
Application Controls Audit Work Program
No ratings yet
Application Controls Audit Work Program
7 pages
Senior Medical Assistant Job Bogotá
No ratings yet
Senior Medical Assistant Job Bogotá
4 pages

Final

Uploaded by

Final

Uploaded by

NST 772 Data Mining and Statistical Learning I

Follow the steps below to conduct the analysis:

You might also like