0% found this document useful (0 votes)
3 views3 pages

Activity Analytics Application

The activity focuses on enhancing students' skills in data cleaning and descriptive analytics using R programming. Students will create a dataset, handle missing data, detect outliers, compute descriptive statistics, and perform clustering analysis. The final submission includes the original and cleaned datasets, R code, and a written report with visualizations.

Uploaded by

sam perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Activity Analytics Application

The activity focuses on enhancing students' skills in data cleaning and descriptive analytics using R programming. Students will create a dataset, handle missing data, detect outliers, compute descriptive statistics, and perform clustering analysis. The final submission includes the original and cleaned datasets, R code, and a written report with visualizations.

Uploaded by

sam perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Activity Title:

Data Cleaning and Analysis: Handling Missing Data, Outliers, Measures of Central
Tendency, and Clustering

Objective:

This activity aims to enhance students’ practical understanding of data cleaning and descriptive
analytics techniques. Students will apply methods to handle missing data, detect and treat
outliers, compute descriptive statistics, and perform clustering analysis using R programming.

By the end of this activity, students should be able to:

1. Create and manage their own dataset.


2. Handle missing and inconsistent data appropriately.
3. Detect and treat outliers using statistical techniques.
4. Compute and interpret mean, median, and mode.
5. Perform basic clustering and interpret the visualization output.
6. Save and document their R code for verification.

Instructions:

Step 1: Dataset Creation

1. Create a unique dataset consisting of 20–30 rows and 5–6 variables.


2. The dataset should be original and based on a theme of your choice, such as:
o Academic performance
o Sales or business data
o Health or fitness data
o Environmental monitoring
o Technology usage or customer feedback
3. The dataset must include:
o At least two numeric variables
o At least one categorical variable
o Some intentionally missing values
o At least one potential outlier

Step 2: Handling Missing Data

1. Identify missing data using R functions (e.g., is.na() or summary()).


2. Apply at least two methods to handle missing values (e.g., mean imputation, median
imputation, or deletion).
3. Briefly explain why each chosen method is appropriate for your dataset.

Step 3: Outlier Detection and Treatment


1. Identify outliers using visualization (e.g., boxplot()) or statistical measures such as IQR.
2. Decide how to treat each outlier (retain, cap, or remove).
3. Provide a short justification for your decision.

Step 4: Computation of Mean, Median, and Mode

1. Compute the mean, median, and mode for at least two numeric variables.
2. Interpret your results clearly in relation to your dataset.

Step 5: Clustering Analysis

1. Select two numeric variables and perform K-Means Clustering in R.


2. Visualize the resulting clusters using an appropriate graph (e.g., ggplot2 or factoextra).
3. Interpret the visual results and describe the similarities or differences between the
clusters.

Step 6: Visualization

1. Create at least two visualizations to support your analysis:


o One plot showing data cleaning results (e.g., boxplot before and after outlier
removal).
o One clustering visualization (e.g., scatter plot of clusters).
2. Ensure all graphs have appropriate titles, axis labels, and legends.

Step 7: Saving and Submitting Your Code

1. Save all the R codes you used for this activity in a Notepad (.txt) file.
2. The file should include comments (#) explaining what each section of your code does.
3. Save the file using this format:
4. Lastname_Firstname_Rcode.txt

5. This file will allow the instructor to check and verify your R script.

Step 8: Final Submission

Submit the following:

1. Original dataset (before cleaning) — Lastname_Firstname_OriginalDataset.csv


2. Cleaned dataset (after cleaning and clustering) —
Lastname_Firstname_CleanedDataset.csv
3. R code file (saved from Notepad) — Lastname_Firstname_Rcode.txt
4. Written report (Word or PDF) — Lastname_Firstname_DataAnalysisReport.docx or .pdf
5. Visualization screenshots showing results and graphs — embedded in your report or
submitted separately.
Evaluation Criteria

Criteria Description Points

Dataset Creation Originality, completeness, and organization 10

Handling Missing Data Appropriate method and justification 15

Outlier Detection & Treatment Correct identification and reasoning 15

Measures of Central Tendency Accuracy and interpretation 15

Clustering Analysis Correct process and explanation 20

Visualization Relevance, clarity, and labeling 10

Code Submission Code correctness and proper documentation 5

Report Presentation Clarity, structure, and depth of analysis 10

Total 100

You might also like