0% found this document useful (0 votes)

27 views5 pages

Exploratory Data Analysis

This document discusses exploratory data analysis of a cancer dataset using various R functions to analyze correlations. Several datasets are extracted from the original cancer data and analyzed using functions for pairwise correlation, scatter plots, normality testing, and simulating raw data from given correlation coefficients. Correlation analysis is performed between variables in each dataset using Pearson and Spearman correlation coefficients. Normality of each variable is also assessed using QQ plots and Wilk-Shapiro tests.

Uploaded by

Cyd Duque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views5 pages

Exploratory Data Analysis

Uploaded by

Cyd Duque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

# Mindanao State University

# General Santos City

# Exploratory Data Analysis

# Prepared by: Prof. Carlito O. Daarol
# Math Department
# March 16, 2023

# -------------------------------------------------------------------
# Activate the file containing all functions
# You should modify the file location because it refers to my laptop
# -------------------------------------------------------------------

(drive <- "D:/")

folder_functions <- "Research/thesisfunctions"
filename <- "fn_More_Correlations.R"
source(paste0(drive, folder_functions,"/",filename))

# --------------------------------------------------------
# Using function 1: Read the dataset using function call
# --------------------------------------------------------

# Set pointer to location of my data ( do not use setwd command for data
retrieval)

folder_data <- "C:/Users/Admin/Desktop/Class Lectures/BLecture 0 Graphics in R"

filename <- "Cancer.csv"
data <- readcsv(folder_data,filename)

# Check contents
dim(data)
colnames(data)
head(data)

# ------------------------------------------
# Using R package to display large dataset
# Output visible only if output format is html
# --------------------------- --------------
library(DT)
datatable(data)

# --------------------------------------------------------
# Using function 3: Compute correlation using two columns
# from the dataset
# --------------------------------------------------------

X <- data$breastcancer
Y <- data$co2emissions
corXY <- correlation(X,Y)
corXY
# Result: NA
# This means computation of correlation is not possible because
# of the presence of Missing values

# Possible solution is to omit the NA values

# this is not good because at the end
# X and Y may not have the same length

# Using function 4: put X and Y into 1 dataframe

dataXY <- as.data.frame(cbind(X,Y))

dim(dataXY)

# select only rows with no missing value

dataXY <- na.omit(dataXY)
dim(dataXY)

corXY <- Correcorre(dataXY)

corXY

# Using function 5 and 2: Construct two sets of variables from the data

str(data) # we need to lookup first on the type of variables we have

# select three columns from the data

Set1 <- data[,c("breastcancer", "alcconsumption","internetuserate")]
anyNA(Set1)

# select another three columns from the data

Set2 <- data[,c("co2emissions", "employrate","lifeexpectancy")]
anyNA(Set2)

# If we delete NA values separately then Set1 and Set2 we

# could possibly have unequal rows
# solution is to combine them as dataframe

tmpdat <- cbind(Set1,Set2)

tmpdat <- na.omit(tmpdat)
dim(tmpdat)

# pull out again Set1 and Set2

Set1 <- tmpdat[,1:3]
Set2 <- tmpdat[,4:6]

#process pairwise correlations by feeding the two sets to the 5th function
Pearsonr <- pairwiseCor(Set1,Set2,"pearson")
Spearmanr <- pairwiseCor(Set1,Set2,"spearman")

Pearsonr
Spearmanr

# Table is not good enough for distribution

# Call the function #2 NiceTable to enhance appearance
NiceTable(Pearsonr,"Pearson Correlation Analysis")
NiceTable(Spearmanr,"Spearman Correlation Analysis")

# Using function 6: Compute correlation using only 1 set of data

Pearson1 <- singlesetCor(tmpdat,"pearson")

Spearman1 <- singlesetCor(tmpdat,"spearman")

#display unformatted table

Pearson1
Spearman1

# Display a better table

NiceTable(Pearson1, "Pearson Correlation Analysis")
NiceTable(Spearman1, "Pearson Correlation Analysis")

# Using function 7: Correlation Coefficents in table format

CorrsjPlot(Set1,"pearson","Pearson Correlation Coefficients")

CorrsjPlot(Set2,"pearson","Pearson Correlation Coefficients")

# Using function 8: Scatter Plot

Set1name <- colnames(Set1)

CorrePlotXY(Set1,Set1name[1],Set1name[2],"blue", "XAxis", "YAxis","pearson")

Set2name <- colnames(Set2)

CorrePlotXY(Set2,Set2name[1],Set2name[2],"blue", "XAxis", "YAxis","pearson")

# Use double for loop to generate all plots for Set 1

for (i in 1:(ncol(Set1)-1)) {
for (j in (i+1):ncol(Set1)){
print(CorrePlotXY(Set1,Set1name[i],Set1name[j],"blue", Set1name[i],
Set1name[j],"pearson"))

}
}

# Use double for loop to generate all plots for Set 2

for (i in 1:(ncol(Set2)-1)) {
for (j in (i+1):ncol(Set2)){
print(CorrePlotXY(Set2,Set2name[i],Set2name[j],"blue", Set2name[i],
Set2name[j],"pearson"))
}
}

#Remark: Plots under double for loop will not appear without the pront command

# Using function 9: How to verify if the data

# satisfies the normal distribution using Wilk-Shapiro test
# Using function 10: How to verify if the data
# satisfies the normal distribution using graphs

NiceTable(Set1,"Dataset in wide original format")

# convert data to long format first

data_long <- melt(Set1)
NiceTable(data_long,"Dataset in long format")
QQNormality_Plot(data_long)

# Points must fall inside the confidence band

# for it to be called as normally distributed.
# If not satisfied call the distribution as Non-normal (skewed)

# Using function 10: How to verify if the data

# satisfies the normal distribution using graphs

NiceTable(Set1,"Dataset in wide original format")

# convert data to long format first

data_long <- melt(Set1)
NiceTable(data_long,"Dataset in long format")
QQNormality_Plot(data_long)

# Points must fall inside the confidence band

# for it to be called as normally distributed.
# If not satisfied call the distribution as Non-normal (skewed)

# Using function 11: For a given set of correlation coefficients, Generate the
# corresponding raw data X and Y.

PlotHistDensity(Set1)

# Using function 12: For a given set of correlation coefficients, Generate the
# corresponding raw data X and Y.

sampleCor <- c(0.214, 0.4, 0.617, 0.742, 0.851, 0.915)

Simulate_XY_From_Correlations(sampleCor)

sampleCor <- c(0.214, 0.3, 0.617, 0.76, 0.851, 0.915)

Simulate_XY_From_Correlations(sampleCor)

# View generated data

gendata <- read.csv("DatXY.csv")
NiceTable(gendata,"Generated Datasets")

List of Functions
No ratings yet
List of Functions
7 pages
R Code
No ratings yet
R Code
9 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
R-Unit 5
No ratings yet
R-Unit 5
76 pages
8 - Cia 3 Key
No ratings yet
8 - Cia 3 Key
3 pages
R Cheatsheet ABC
No ratings yet
R Cheatsheet ABC
3 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Statistics & Data Science Cheat Sheet
No ratings yet
Statistics & Data Science Cheat Sheet
3 pages
Advanced Stats & Data Science Guide
No ratings yet
Advanced Stats & Data Science Guide
3 pages
Badigi's Lab Assignment on Correlation
No ratings yet
Badigi's Lab Assignment on Correlation
10 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Cheatsheet ABCD
No ratings yet
R Cheatsheet ABCD
3 pages
Adhithyan
No ratings yet
Adhithyan
22 pages
R Programming Practical Exercises
No ratings yet
R Programming Practical Exercises
13 pages
Ds
No ratings yet
Ds
2 pages
Data Manipulation and Visualization in R
No ratings yet
Data Manipulation and Visualization in R
58 pages
Summary Statistics and Data Analysis in R
No ratings yet
Summary Statistics and Data Analysis in R
11 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
R Programming End Term
No ratings yet
R Programming End Term
4 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
Data Science
No ratings yet
Data Science
20 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
Prob Lab
No ratings yet
Prob Lab
10 pages
Group 2 Final Project
No ratings yet
Group 2 Final Project
15 pages
R
No ratings yet
R
6 pages
R Functions
No ratings yet
R Functions
8 pages
R Pgms 30
No ratings yet
R Pgms 30
6 pages
Understanding Scatter Plots and Correlation
No ratings yet
Understanding Scatter Plots and Correlation
15 pages
Correlation Analysis in Python
100% (1)
Correlation Analysis in Python
6 pages
Solutions For QB3
No ratings yet
Solutions For QB3
14 pages
R Examples
No ratings yet
R Examples
56 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
Day 2
No ratings yet
Day 2
5 pages
ProbsStats3 21BEC0384
No ratings yet
ProbsStats3 21BEC0384
12 pages
CourseKata R Cheatsheet ABC
No ratings yet
CourseKata R Cheatsheet ABC
5 pages
Fda SSIGNMENT 02
No ratings yet
Fda SSIGNMENT 02
13 pages
List of Programs in R 2 Sem
No ratings yet
List of Programs in R 2 Sem
48 pages
R Programming Basics and Data Analysis
No ratings yet
R Programming Basics and Data Analysis
18 pages
Advance Data Exploration 27 Feb
No ratings yet
Advance Data Exploration 27 Feb
32 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
R Quiz
No ratings yet
R Quiz
291 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
R Computer Lab4 Instructions
No ratings yet
R Computer Lab4 Instructions
10 pages
Practical No 12 SML
No ratings yet
Practical No 12 SML
6 pages
R File Code
No ratings yet
R File Code
16 pages
Data Cleansing
No ratings yet
Data Cleansing
18 pages
BAN5
No ratings yet
BAN5
2 pages
06 - Problems With The Error
No ratings yet
06 - Problems With The Error
2 pages
9488 Et Longitudinal 2 Eda
No ratings yet
9488 Et Longitudinal 2 Eda
30 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
RStudio Tips and Common Functions Guide
No ratings yet
RStudio Tips and Common Functions Guide
7 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Unpaid Dividend 2008-09
No ratings yet
Unpaid Dividend 2008-09
2,433 pages
Plant MicroRNAs Methods and Protocols 1st Edition Zhixin Xie (Auth.) Instant Access 2025
No ratings yet
Plant MicroRNAs Methods and Protocols 1st Edition Zhixin Xie (Auth.) Instant Access 2025
167 pages
Resistance Factors For Geotechnical Resistance of Drilled Shafts (Aashtoo2012)
No ratings yet
Resistance Factors For Geotechnical Resistance of Drilled Shafts (Aashtoo2012)
1 page
Practice Questions
No ratings yet
Practice Questions
34 pages
CUET 2024 General Test Prep
No ratings yet
CUET 2024 General Test Prep
17 pages
Hempton 1985
No ratings yet
Hempton 1985
12 pages
Bootstrapped Switch Design Guide
No ratings yet
Bootstrapped Switch Design Guide
10 pages
Buckeye Kitchen Mister Manual
No ratings yet
Buckeye Kitchen Mister Manual
74 pages
10 Essential React Security Tips
No ratings yet
10 Essential React Security Tips
1 page
Jntua University Previous Question Papers: Dept., of E.C.E, RCEW
No ratings yet
Jntua University Previous Question Papers: Dept., of E.C.E, RCEW
3 pages
SK200-8 Shop Manual S5YN0018E01
No ratings yet
SK200-8 Shop Manual S5YN0018E01
1,086 pages
Fuel Gas Scrubber Specifications
No ratings yet
Fuel Gas Scrubber Specifications
1 page
Concepts in Abstract Algebra 1st Edition Charles Lanski Instant Download
100% (3)
Concepts in Abstract Algebra 1st Edition Charles Lanski Instant Download
85 pages
Math 10: Inscribed Angles Guide
No ratings yet
Math 10: Inscribed Angles Guide
6 pages
DAY 27 - GAS TURBINE (BRAYTON CYCLE) L LECTURE PROBLEMS
No ratings yet
DAY 27 - GAS TURBINE (BRAYTON CYCLE) L LECTURE PROBLEMS
1 page
Ionic Bonds and Electrostatic Forces
No ratings yet
Ionic Bonds and Electrostatic Forces
5 pages
Class 10 Science: Carbon Compounds Assignment
No ratings yet
Class 10 Science: Carbon Compounds Assignment
12 pages
Fisher Pierce Fishpif00001 10-2-6
No ratings yet
Fisher Pierce Fishpif00001 10-2-6
6 pages
High Voltage Power Transmission Systems
No ratings yet
High Voltage Power Transmission Systems
43 pages
Model Question Paper for 21AD62 Data Science
No ratings yet
Model Question Paper for 21AD62 Data Science
5 pages
Motion Tracking Using Kalman Filter Matlab Code
100% (2)
Motion Tracking Using Kalman Filter Matlab Code
2 pages
Thermal Energy Transfer Processes
No ratings yet
Thermal Energy Transfer Processes
28 pages
Brochure APC Modular and High Density Cooling
No ratings yet
Brochure APC Modular and High Density Cooling
20 pages
Motor Mp8 Mack 2
92% (24)
Motor Mp8 Mack 2
144 pages
Exam Style Geometry Questions Guide
No ratings yet
Exam Style Geometry Questions Guide
8 pages
Simplifying Radicals
No ratings yet
Simplifying Radicals
4 pages
Pediatric ECG Rhythm Analysis Guide
No ratings yet
Pediatric ECG Rhythm Analysis Guide
1 page
Principles of Development 6th Edition Lewis Wolpert Instant Download
No ratings yet
Principles of Development 6th Edition Lewis Wolpert Instant Download
65 pages
Calculation For Temporary Staging of PSC 45.1m
100% (10)
Calculation For Temporary Staging of PSC 45.1m
10 pages
Internal Assignment: M.Sc. (Computer Science) - Previous
No ratings yet
Internal Assignment: M.Sc. (Computer Science) - Previous
7 pages