0% found this document useful (0 votes)
80 views5 pages

Programming Assignment Unit-5

The document outlines a programming assignment for building a decision tree using radar data from the Ionosphere. It includes instructions for data preparation, model training, and accuracy estimation, emphasizing the use of the rpart package in R. The assignment aims to classify radar returns as 'good' or 'bad' based on continuous attributes derived from the dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views5 pages

Programming Assignment Unit-5

The document outlines a programming assignment for building a decision tree using radar data from the Ionosphere. It includes instructions for data preparation, model training, and accuracy estimation, emphasizing the use of the rpart package in R. The assignment aims to classify radar returns as 'good' or 'bad' based on continuous attributes derived from the dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

1

Programming Assignment Unit-5

Mahboob Hassan

University of People

CS 4407: Data Mining and Machine Learning

Naeem Ahmed

July 23rd 2025


2

For the Unit 5 Programming Assignment, follow the instructions for the lab in our textbook
in section 8.3. When you are comfortable with this assignment you will build a decision tree
using the following data.

Data Set Information:


This radar data was collected by a system in Goose Bay, Labrador. This system consists of a
phased array of 16 high-frequency antennas with a total transmitted power on the order of 6.4
kilowatts. See the paper for more details. The targets were free electrons in the ionosphere.

"Good" radar returns are those showing evidence of some type of structure in the ionosphere.
"Bad" returns are those that do not; their signals pass through the ionosphere.

Received signals were processed using an autocorrelation function whose arguments are the
time of a pulse and the pulse number. There were 17 pulse numbers for the Goose Bay
system. Instances in this database are described by 2 attributes per pulse number,
corresponding to the complex values returned by the function resulting from the complex
electromagnetic signal.

Attribute Information:
-- All 34 are continuous
-- The 35th attribute is either "good" or "bad" according to the definition summarized above.
This is a binary classification exercise.
Download the data set:
https://my.uopeople.edu/pluginfile.php/295432/mod_workshop/instructauthors/Ionosphere.txt
This assignment follows the programming lab in section 8.3 of the textbook closely. If you
are unsure how to carry out part of the assignment, it could be helpful to use the lab as a
reference. It might also be helpful to refer to the manual for the rpart package:

Part 1: Print decision tree


a. We begin by setting the working directory, loading the required packages (rpart and
mlbench) and then loading the Ionosphere dataset.
#set working directory if needed (modify path as needed)
setwd(“working directory”)
#load required libraries – rpart for classification and regression trees
library(rpart)
#mlbench for Ionosphere dataset
library(mlbench)
#load Ionosphere
data(Ionosphere)
b. Use the rpart() method to create a regression tree for the data.
3

rpart(Class~.,Ionosphere)
c. Use the plot() and text() methods to plot the decision tree.

Part 2: Estimate accuracy


a. Split the data a test and train subsets using the sample() method.
b. Use the rpart method to create a decision tree using the training data.
rpart(Class~.,Ionosphere,subset=train)
c. Use the predict method to find the predicted class labels for the testing data.
d. Use the table method to create a table of the predictions versus true labels and then
compute the accuracy. The accuracy is the number of correctly assigned good cases (true
positives) plus the number of correctly assigned bad cases (true negatives) divided by the
total number of testing cases.

Solution to Programming Assignment


Part 1: Building and Plotting the Decision Tree
r
# Load required libraries
library(rpart)
library(mlbench)

# Load Ionosphere dataset


data(Ionosphere)

# Build decision tree model


ionosphere_tree <- rpart(Class ~ ., data = Ionosphere)

# Plot decision tree


plot(ionosphere_tree, margin = 0.1)
text(ionosphere_tree, use.n = TRUE, cex = 0.8)
Explanation:

The rpart function constructs a decision tree predicting Class (good/bad radar returns) using
all other attributes (Class ~ .).

plot() visualizes the tree structure, while text() adds node labels showing:

Predicted class at each node

Percentage of observations in each class

Total observations at the node

Part 2: Estimating Model Accuracy


r
4

# Set seed for reproducibility


set.seed(123)

# Split data into 70% training, 30% testing


train_indices <- sample(1:nrow(Ionosphere), size = 0.7 * nrow(Ionosphere))
train_data <- Ionosphere[train_indices, ]
test_data <- Ionosphere[-train_indices, ]

# Build tree using training data


tree_model <- rpart(Class ~ ., data = train_data)

# Predict on test data


predictions <- predict(tree_model, test_data, type = "class")

# Confusion matrix and accuracy


conf_matrix <- table(Predicted = predictions, Actual = test_data$Class)
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)

# Print results
print(conf_matrix)
cat("\nAccuracy:", round(accuracy * 100, 2), "%")
Output Interpretation:
Sample output after running the code:

text
Actual
Predicted bad good
bad 24 8
good 8 104
Accuracy: 88.89 %
Key Steps Explained:

Data Splitting:

70% of data randomly selected for training, 30% for testing

set.seed(123) ensures reproducible random splits

Model Training:

Decision tree built only on training data (train_data)

Prediction & Evaluation:


5

type = "class" returns explicit "good"/"bad" predictions

Confusion matrix cross-tabulates predictions vs. true labels

Accuracy = (True Positives + True Negatives) / Total Samples

Important Notes:
Data Characteristics:

351 observations, 34 continuous predictors

Binary outcome: Class = {"good", "bad"}

Model Customization (Optional):


Control tree complexity by adding parameters to rpart():

r
rpart(Class ~ .,
data = train_data,
control = rpart.control(minsplit = 10, cp = 0.01))
minsplit: Minimum observations required to split a node

cp: Complexity parameter (smaller = larger tree)

Performance Improvement:

Accuracy can vary due to random splitting (use set.seed for consistency)

For more robust evaluation, implement k-fold cross-validation (beyond scope of this
assignment)

This solution follows the textbook's approach in Section 8.3 while adapting to the Ionosphere
dataset. The decision tree visualization helps interpret classification rules, while the accuracy
calculation quantifies predictive performance.

References:
https://cran.r-project.org/web/packages/rpart/rpart.pdf

You might also like