Week3 Cheat Sheet Exploratory Data Analysis

This document is a cheat sheet for Exploratory Data Analysis (EDA) that provides a summary of various R functions and their syntax, including 'summarize', 'group_by', 'cor', 'cor.test', 'aov', 'count', 'ggplot', and others. Each function is accompanied by a brief description and an example of its usage. The document also includes a changelog detailing updates made by different authors.

Uploaded by

moonb4115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views3 pages

Week3 Cheat Sheet Exploratory Data Analysis

Uploaded by

moonb4115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Cheat Sheet: Exploratory Data Analysis

Command Syntax Description Example

summarize function reduces a
data frame to a summary of
just one vector or value.

.data

A data frame, data frame

extension (e.g. a tibble), or a avg_delays <- sub_airline %>%
lazy data frame group_by(Reporting_Airline,
DayOfWeek) %>%
summarize() summarize(.data, ...)
summarize(mean_delays =
… mean(ArrDelayMinutes),
.groups = 'keep')
Name-value pairs of
summary functions. The
name will be the name of the
variable in the result. The
value should be an expression
that returns a single value like
min(x), n(), or sum([Link](y))
group_by function takes an
existing table and converts it
into a grouped table where
operations are performed "by
group".

.data
A data frame, data frame
extension (e.g. a tibble), or a sub_airline %>%
group_by(.data, ..., .add =
lazy data frame group_by(Reporting_Airline)
group_by() FALSE, .drop =
%>% summarize(mean_delays =
group_by_drop_default(.data))
.add mean(ArrDelayMinutes))
When FALSE, the default,
group_by() will override
existing groups.

.drop
Drop groups formed by factor
levels that don’t appear in the
data
cor() cor(x, use=, method= ) cor function computes the sub_airline %>%
correlation coefficient select(DepDelayMinutes,
ArrDelayMinutes) %>%
cor(method = "pearson")
x: Matrix or data frame

use: Specifies the handling of

missing data.
method: Specifies the type of
correlation. Options are
pearson, spearman or kendall.
[Link] function is a test for
association/correlation
[Link](x, y, alternative =
between paired samples. It
c("[Link]", "less", returns both the correlation
"greater"), method = coefficient and the sub_airline %>%
[Link]() c("pearson", "kendall", significance level(or p-value) [Link](~DepDelayMinutes +
"spearman"), exact = NULL, of the correlation . ArrDelayMinutes, data = .)
[Link] = 0.95, continuity
= FALSE, …)
x, y: numeric vectors of data
values. x and y must have the
same length.
aov function (Analysis of
Variance (ANOVA)) is a
statistical method used to test
whether there are significant
aa_as_subset <- sub_airline
differences between the %>% select(ArrDelay,
means of two or more groups. Reporting_Airline) %>%
filter(Reporting_Airline ==
aov(formula, data = NULL, formula: A formula 'AA' | Reporting_Airline ==
aov projections = FALSE, qr =
TRUE, contrasts = NULL, …) specifying the model. 'AS')

data: A data frame in which ad_aov <- aov(ArrDelay ~

Reporting_Airline, data =
the variables specified in the aa_as_subset)
formula will be found. If
missing, the variables are
searched for in the standard
way.
count function lets you
quickly count the unique
values of one or more
variables
count(df, vars = NULL, wt_var sub_airline %>%
count() = NULL) count(Reporting_Airline)
df: data frame to be processed

vars: variables to count

unique values of
ggplot function initializes a
ggplot object. It can be used
to declare the input data
ggplot(aes(x =
ggplot(data = NULL, mapping = frame for a graphic and to
Reporting_Airline, y =
ggplot() aes(), ..., environment = specify the set of plot DayOfWeek, fill =
[Link]()) aesthetics intended to be mean_delays))
common throughout all
subsequent layers unless
specifically overridden.
corrplot() corrplot(method=, type=,....) corrplot function provides a corrplot(airlines_cor, method
visual exploratory tool on = "color", col = col(200),
type = "upper", order =
correlation matrix that "hclust", [Link] =
supports automatic variable "black", # Add coefficient of
reordering to help detect correlation [Link] = "black",
hidden patterns among [Link] = 45, #Text label
variables. color and rotation )

method: There are seven

visualization methods
(parameter method) in
corrplot package, named
‘circle’, ‘square’, ‘ellipse’,
‘number’, ‘shade’, ‘color’,
‘pie’

type: There are three layout

types (parameter type): ‘full’,
‘upper’ and ‘lower’.
geom_bar
ggplot(aes(x =
Reporting_Airline, y =
geom_bar(mapping = NULL, data function is used to produce Average_Delays)) +
geom_bar() = NULL, stat = "bin", position
1d area plots: bar charts for geom_bar(stat = "identity") +
= "stack", ...)
categorical x, and histograms ggtitle("Average Arrival
for continuous y. Delays by Airline")
ggplot(avg_delays, aes(x =
Reporting_Airline, y =
geom_tile(mapping = NULL, data geom_tile function tile plane lubridate::wday(DayOfWeek,
geom_tile() = NULL, stat = "identity",
position = "identity", ...) with rectangles. label = TRUE), fill = bins))
+ geom_tile(colour = "white",
size = 0.2)
ggplot(avg_delays, aes(x =
Reporting_Airline, y =
geom_text(mapping = NULL, data lubridate::wday(DayOfWeek,
= NULL, stat = "identity", geom_text used for text label = TRUE), fill = bins))
geom_text() position = "identity", parse = annotation. + geom_tile(colour = "white",
FALSE, ...) size = 0.2) +
geom_text(aes(label =
round(mean_delays, 3)))
ggplot(avg_delays, aes(x =
Reporting_Airline, y =
labs(...)
lubridate::wday(DayOfWeek,
labs Change axis labels and label = TRUE), labs(x =
labs() …
a list of new names in the legend titles "Reporting Airline",y = "Day
of Week",title = "Average
form aesthetic = “new name”
Arrival Delays") fill =
bins)) +
scale_fill_manual function
Change axis labels and
legend titles

…
scale_fill_manual(values =
common discrete scale c("#d53e4f", "#f46d43",
scale_fill_manual() scale_fill_manual(..., values) parameters: name, breaks, "#fdae61", "#fee08b",
labels, [Link], limits and "#e6f598", "#abdda4"))
guide. See discrete_scale for
more details

values: a set of aesthetic

values to map data values to.

Author(s)
Lakshmi Holla

Changelog
Date Version Changed by Change Description
2023-05-11 1.1 Eric Hao & Vladislav Boyko Updated Page Frames
2021-08-09 1.0 Lakshmi Holla Initial Version

Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
NYC Flights Data Analysis Lab
No ratings yet
NYC Flights Data Analysis Lab
9 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
P6ADBMS
No ratings yet
P6ADBMS
34 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
Data Manipulation and Visualization in R
No ratings yet
Data Manipulation and Visualization in R
58 pages
R For Health Data Science
100% (1)
R For Health Data Science
365 pages
Dav Exp8
No ratings yet
Dav Exp8
10 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
Module 2 ExploratoryDataAnalysis
No ratings yet
Module 2 ExploratoryDataAnalysis
22 pages
Advanced Data Management with dplyr
No ratings yet
Advanced Data Management with dplyr
36 pages
R Graphics Essentials For Great Data Visualization
No ratings yet
R Graphics Essentials For Great Data Visualization
28 pages
Visualizing Data in R
100% (1)
Visualizing Data in R
20 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
R
No ratings yet
R
6 pages
DSCI 100 Cheat Sheet
No ratings yet
DSCI 100 Cheat Sheet
3 pages
Graphs and Viz With R
No ratings yet
Graphs and Viz With R
119 pages
R Data Manipulation Guide
No ratings yet
R Data Manipulation Guide
46 pages
Tài Liệu Không Có Tiêu Đề
No ratings yet
Tài Liệu Không Có Tiêu Đề
7 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Modern Statistics With R
100% (4)
Modern Statistics With R
580 pages
R Exercises For Modules
100% (1)
R Exercises For Modules
41 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
RStudio Tips and Common Functions Guide
No ratings yet
RStudio Tips and Common Functions Guide
7 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Descriptive Statistics, Hypothesis Testing, and Basic
No ratings yet
Descriptive Statistics, Hypothesis Testing, and Basic
62 pages
Bar Graph Line Graph Scatter Plot and All Plot Functions
No ratings yet
Bar Graph Line Graph Scatter Plot and All Plot Functions
34 pages
Unit 3data Visualization With Ggplot2
No ratings yet
Unit 3data Visualization With Ggplot2
19 pages
Solutions For QB3
No ratings yet
Solutions For QB3
14 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
Excel and R Integration
No ratings yet
Excel and R Integration
20 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Module II
No ratings yet
Module II
40 pages
Exploratory Data Analysis With R
No ratings yet
Exploratory Data Analysis With R
218 pages
cs446 - Tool Summarizing and Visualizing Numerical Variables in Bbivariate and Multivariate Analyses
No ratings yet
cs446 - Tool Summarizing and Visualizing Numerical Variables in Bbivariate and Multivariate Analyses
14 pages
Week4 CheatSheet ModelDevelopment
No ratings yet
Week4 CheatSheet ModelDevelopment
4 pages
DataViz Ggplot Sample
No ratings yet
DataViz Ggplot Sample
23 pages
DA Lab Week-2
No ratings yet
DA Lab Week-2
22 pages
R File Management and Data Handling Guide
No ratings yet
R File Management and Data Handling Guide
10 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
19 pages
Graphics
No ratings yet
Graphics
10 pages
R Unit5
No ratings yet
R Unit5
12 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Data Visulization1
No ratings yet
Data Visulization1
39 pages
Figures With GGPlot
No ratings yet
Figures With GGPlot
58 pages
Unit 3
No ratings yet
Unit 3
36 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Verzani Answers
100% (8)
Verzani Answers
94 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
EDAV
No ratings yet
EDAV
218 pages
All Codes
No ratings yet
All Codes
10 pages
Creating EDA Reports Using Ggplot2 in R Markdown
No ratings yet
Creating EDA Reports Using Ggplot2 in R Markdown
5 pages
Lec06-Data Visualization
No ratings yet
Lec06-Data Visualization
70 pages
R File Code
No ratings yet
R File Code
16 pages
5th Report
No ratings yet
5th Report
23 pages
Graphical Analysis
No ratings yet
Graphical Analysis
64 pages
Drug Poisoning
No ratings yet
Drug Poisoning
70 pages
Week1 Cheat Sheet Dplyr Functions
No ratings yet
Week1 Cheat Sheet Dplyr Functions
2 pages
Chicken Farm Project Cameras Technical Specs
No ratings yet
Chicken Farm Project Cameras Technical Specs
17 pages
DS-MCH208 Datasheet 20241206
No ratings yet
DS-MCH208 Datasheet 20241206
5 pages
HikVision Only 8MP Cameras
No ratings yet
HikVision Only 8MP Cameras
5 pages
Bachelor of Commerce Certificate
No ratings yet
Bachelor of Commerce Certificate
10 pages
STD5 Inter Disciplinary Assignment
No ratings yet
STD5 Inter Disciplinary Assignment
3 pages
bPAC SDK
No ratings yet
bPAC SDK
28 pages
Basic Research in Computer Science BRICS RS-97-43
No ratings yet
Basic Research in Computer Science BRICS RS-97-43
19 pages
OBE Template UGC
No ratings yet
OBE Template UGC
47 pages
Classroom Observation Report
No ratings yet
Classroom Observation Report
16 pages
Slings Catalog
No ratings yet
Slings Catalog
152 pages
MC Doc
No ratings yet
MC Doc
56 pages
A3 Template
100% (1)
A3 Template
2 pages
10390001-2-3 h1748 As Built Documents
No ratings yet
10390001-2-3 h1748 As Built Documents
768 pages
Contexts in Poetry
No ratings yet
Contexts in Poetry
2 pages
ICRU Report 87
No ratings yet
ICRU Report 87
164 pages
Quality Tolerances For Water For Textile Industry: Indian Standard
No ratings yet
Quality Tolerances For Water For Textile Industry: Indian Standard
10 pages
Module-2 LEOA
No ratings yet
Module-2 LEOA
9 pages
trs5 Key Cho Cac Ban
No ratings yet
trs5 Key Cho Cac Ban
35 pages
Detailed Lesson Plan-Interactive
0% (1)
Detailed Lesson Plan-Interactive
2 pages
Analyzing H&T Organizational Context
No ratings yet
Analyzing H&T Organizational Context
29 pages
Highway Horizontal Alignment Guide
No ratings yet
Highway Horizontal Alignment Guide
76 pages
Press Tool PDF
No ratings yet
Press Tool PDF
55 pages
Science Form 3 Chapter 1-3
100% (6)
Science Form 3 Chapter 1-3
7 pages
Previews IEEE 841-2009 Pre
100% (1)
Previews IEEE 841-2009 Pre
14 pages
English 6 - Q2 - LP5
No ratings yet
English 6 - Q2 - LP5
9 pages
Employee Record Management System
No ratings yet
Employee Record Management System
8 pages
assignmentASM 32901 PDF
No ratings yet
assignmentASM 32901 PDF
16 pages
Common 8th SEM Project Report
No ratings yet
Common 8th SEM Project Report
46 pages
Lavarropas Drean Excellent 166 Guide
No ratings yet
Lavarropas Drean Excellent 166 Guide
6 pages
Engineering Students' Gear Design Guide
No ratings yet
Engineering Students' Gear Design Guide
2 pages
Sil Quick Guide 1528 PDF
No ratings yet
Sil Quick Guide 1528 PDF
4 pages
Class 10 Non-Finites MCQs PDF Download
No ratings yet
Class 10 Non-Finites MCQs PDF Download
3 pages
PTCUser Sweden Creo3.0 Update
No ratings yet
PTCUser Sweden Creo3.0 Update
114 pages

Week3 Cheat Sheet Exploratory Data Analysis

Uploaded by

Week3 Cheat Sheet Exploratory Data Analysis

Uploaded by

Cheat Sheet: Exploratory Data Analysis

Command Syntax Description Example

A data frame, data frame

use: Specifies the handling of

data: A data frame in which ad_aov <- aov(ArrDelay ~

vars: variables to count

method: There are seven

type: There are three layout

values: a set of aesthetic

You might also like