0% found this document useful (0 votes)

3 views3 pages

Activity Analytics Application

The activity focuses on enhancing students' skills in data cleaning and descriptive analytics using R programming. Students will create a dataset, handle missing data, detect outliers, compute descriptive statistics, and perform clustering analysis. The final submission includes the original and cleaned datasets, R code, and a written report with visualizations.

Uploaded by

sam perez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Activity Analytics Application

Uploaded by

sam perez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Activity Title:

Data Cleaning and Analysis: Handling Missing Data, Outliers, Measures of Central
Tendency, and Clustering

Objective:

This activity aims to enhance students’ practical understanding of data cleaning and descriptive
analytics techniques. Students will apply methods to handle missing data, detect and treat
outliers, compute descriptive statistics, and perform clustering analysis using R programming.

By the end of this activity, students should be able to:

1. Create and manage their own dataset.

2. Handle missing and inconsistent data appropriately.
3. Detect and treat outliers using statistical techniques.
4. Compute and interpret mean, median, and mode.
5. Perform basic clustering and interpret the visualization output.
6. Save and document their R code for verification.

Instructions:

Step 1: Dataset Creation

1. Create a unique dataset consisting of 20–30 rows and 5–6 variables.

2. The dataset should be original and based on a theme of your choice, such as:
o Academic performance
o Sales or business data
o Health or fitness data
o Environmental monitoring
o Technology usage or customer feedback
3. The dataset must include:
o At least two numeric variables
o At least one categorical variable
o Some intentionally missing values
o At least one potential outlier

Step 2: Handling Missing Data

1. Identify missing data using R functions (e.g., is.na() or summary()).

2. Apply at least two methods to handle missing values (e.g., mean imputation, median
imputation, or deletion).
3. Briefly explain why each chosen method is appropriate for your dataset.

Step 3: Outlier Detection and Treatment

1. Identify outliers using visualization (e.g., boxplot()) or statistical measures such as IQR.
2. Decide how to treat each outlier (retain, cap, or remove).
3. Provide a short justification for your decision.

Step 4: Computation of Mean, Median, and Mode

1. Compute the mean, median, and mode for at least two numeric variables.
2. Interpret your results clearly in relation to your dataset.

Step 5: Clustering Analysis

1. Select two numeric variables and perform K-Means Clustering in R.

2. Visualize the resulting clusters using an appropriate graph (e.g., ggplot2 or factoextra).
3. Interpret the visual results and describe the similarities or differences between the
clusters.

Step 6: Visualization

1. Create at least two visualizations to support your analysis:

o One plot showing data cleaning results (e.g., boxplot before and after outlier
removal).
o One clustering visualization (e.g., scatter plot of clusters).
2. Ensure all graphs have appropriate titles, axis labels, and legends.

Step 7: Saving and Submitting Your Code

1. Save all the R codes you used for this activity in a Notepad (.txt) file.
2. The file should include comments (#) explaining what each section of your code does.
3. Save the file using this format:
4. Lastname_Firstname_Rcode.txt

5. This file will allow the instructor to check and verify your R script.

Step 8: Final Submission

Submit the following:

1. Original dataset (before cleaning) — Lastname_Firstname_OriginalDataset.csv

2. Cleaned dataset (after cleaning and clustering) —
Lastname_Firstname_CleanedDataset.csv
3. R code file (saved from Notepad) — Lastname_Firstname_Rcode.txt
4. Written report (Word or PDF) — Lastname_Firstname_DataAnalysisReport.docx or .pdf
5. Visualization screenshots showing results and graphs — embedded in your report or
submitted separately.
Evaluation Criteria

Criteria Description Points

Dataset Creation Originality, completeness, and organization 10

Handling Missing Data Appropriate method and justification 15

Outlier Detection & Treatment Correct identification and reasoning 15

Measures of Central Tendency Accuracy and interpretation 15

Clustering Analysis Correct process and explanation 20

Visualization Relevance, clarity, and labeling 10

Code Submission Code correctness and proper documentation 5

Report Presentation Clarity, structure, and depth of analysis 10

Total 100

? Detailed Program Script
No ratings yet
? Detailed Program Script
3 pages
Statement of Income
No ratings yet
Statement of Income
3 pages
MR and MS It Guidelines and Mechanics
No ratings yet
MR and MS It Guidelines and Mechanics
5 pages
For Inquire
No ratings yet
For Inquire
43 pages
Digital Management Adoption
No ratings yet
Digital Management Adoption
2 pages
Handlemissinggenderdata Perez
No ratings yet
Handlemissinggenderdata Perez
4,126 pages
To Try Again
No ratings yet
To Try Again
10 pages
Task Monitoring Sheet
No ratings yet
Task Monitoring Sheet
6 pages
Volleyball Course Overview & History
No ratings yet
Volleyball Course Overview & History
38 pages
Digital Entrepreneurship & Transformation
No ratings yet
Digital Entrepreneurship & Transformation
32 pages
Project Lists
No ratings yet
Project Lists
4 pages
Part 1 2 Dama Board Game
No ratings yet
Part 1 2 Dama Board Game
28 pages
ESS Features: Drill-Down Capabilities
No ratings yet
ESS Features: Drill-Down Capabilities
5 pages
CC1100 Prelims
No ratings yet
CC1100 Prelims
87 pages
2023 FMSS PRELIM P2 (Ans Key)
No ratings yet
2023 FMSS PRELIM P2 (Ans Key)
8 pages
ND77 UserManual
No ratings yet
ND77 UserManual
63 pages
Factor Market and Production Market
100% (2)
Factor Market and Production Market
43 pages
ISE Microsoft Office 365: A Skills Approach 2021 Edition Edition Cheri Manning Latest PDF 2025
No ratings yet
ISE Microsoft Office 365: A Skills Approach 2021 Edition Edition Cheri Manning Latest PDF 2025
151 pages
HM61/AM61V: Home Comfort Systems
No ratings yet
HM61/AM61V: Home Comfort Systems
4 pages
Sheet 2 Solution
No ratings yet
Sheet 2 Solution
6 pages
B32-RDBMS Assignment Question
No ratings yet
B32-RDBMS Assignment Question
4 pages
Riedel Catalog 2022 Autumn
No ratings yet
Riedel Catalog 2022 Autumn
49 pages
Rae&Wong Case 1
No ratings yet
Rae&Wong Case 1
6 pages
Dynamic Ultra Plus: SAE 15W-40
No ratings yet
Dynamic Ultra Plus: SAE 15W-40
2 pages
Introduction to Human Resource Development
No ratings yet
Introduction to Human Resource Development
63 pages
Digital Marketing Insights
No ratings yet
Digital Marketing Insights
12 pages
British Education System Before British Education System Before Independent India
No ratings yet
British Education System Before British Education System Before Independent India
4 pages
BLA1111 - Critical Reading and Response CAT2
No ratings yet
BLA1111 - Critical Reading and Response CAT2
5 pages
Allegra 6 Series
No ratings yet
Allegra 6 Series
52 pages
Encounter Essay TEMEN OBLAK
No ratings yet
Encounter Essay TEMEN OBLAK
7 pages
Ethnomedicinal Survey of Flora of Ajmer Region, Rajasthan: Et Al
No ratings yet
Ethnomedicinal Survey of Flora of Ajmer Region, Rajasthan: Et Al
6 pages
Energy and Matter
No ratings yet
Energy and Matter
11 pages
Blackbox & UAT Analysis for Solusimedsosku
No ratings yet
Blackbox & UAT Analysis for Solusimedsosku
9 pages
Tema 1 5º Science Natural
No ratings yet
Tema 1 5º Science Natural
19 pages
Calcium PDF
No ratings yet
Calcium PDF
3 pages
Cambridge IGCSE ™: French 0520/41
No ratings yet
Cambridge IGCSE ™: French 0520/41
12 pages
Leveraging Science Cities Liveability Sustainability Resilience
No ratings yet
Leveraging Science Cities Liveability Sustainability Resilience
22 pages
Intellectual Revolutions in History
No ratings yet
Intellectual Revolutions in History
29 pages
Mesc Spe 74-019
No ratings yet
Mesc Spe 74-019
7 pages
Chronology of Film - Motion Picture Film
No ratings yet
Chronology of Film - Motion Picture Film
1 page
000 3DT Ee 03584 000
100% (5)
000 3DT Ee 03584 000
135 pages
Nostalgic Journey from Kashmir to Malaysia
No ratings yet
Nostalgic Journey from Kashmir to Malaysia
2 pages
The Formalization of Selection Procedures
No ratings yet
The Formalization of Selection Procedures
4 pages
Modern Power Transformer Practice3
No ratings yet
Modern Power Transformer Practice3
3 pages

Activity Analytics Application

Uploaded by

Activity Analytics Application

Uploaded by

Activity Title:

By the end of this activity, students should be able to:

1. Create and manage their own dataset.

Step 1: Dataset Creation

1. Create a unique dataset consisting of 20–30 rows and 5–6 variables.

Step 2: Handling Missing Data

1. Identify missing data using R functions (e.g., is.na() or summary()).

Step 3: Outlier Detection and Treatment

Step 4: Computation of Mean, Median, and Mode

Step 5: Clustering Analysis

1. Select two numeric variables and perform K-Means Clustering in R.

1. Create at least two visualizations to support your analysis:

Step 7: Saving and Submitting Your Code

Step 8: Final Submission

Submit the following:

1. Original dataset (before cleaning) — Lastname_Firstname_OriginalDataset.csv

Criteria Description Points

Dataset Creation Originality, completeness, and organization 10

Handling Missing Data Appropriate method and justification 15

Outlier Detection & Treatment Correct identification and reasoning 15

Measures of Central Tendency Accuracy and interpretation 15

Clustering Analysis Correct process and explanation 20

Visualization Relevance, clarity, and labeling 10

Code Submission Code correctness and proper documentation 5

Report Presentation Clarity, structure, and depth of analysis 10

You might also like