0% found this document useful (0 votes)
17 views6 pages

Basic Inferential Data Analysis

The document analyzes tooth growth data from an experiment with guinea pigs. Vitamin C was supplied at different doses both as orange juice and ascorbic acid. 13 research questions are posed to determine if tooth length differs between the supply methods and doses based on hypothesis tests of the group means.

Uploaded by

Peter Thai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Basic Inferential Data Analysis

The document analyzes tooth growth data from an experiment with guinea pigs. Vitamin C was supplied at different doses both as orange juice and ascorbic acid. 13 research questions are posed to determine if tooth length differs between the supply methods and doses based on hypothesis tests of the group means.

Uploaded by

Peter Thai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Basic Inferential Data Analysis

JZstats
Course 6: Statistical Inference
Data Science Specialization, Coursera

OVERVIEW 1.2 By Dose


Based on the ‘ToothGrowth’ data by Galton, a comparison Vitamin C was supplied in doses of 0.5, 1 or 2 mg/day.
was conducted for the effects of Vitamin C in tooth growth For the groups defined by dose, 3 Research Question were
of Guinea Pigs between the groups defined by two factors, formed:
supply method and dose as well as their interaction.
RQ-02: Is length of tooth bigger, when 2 instead of 0.5
1. EXPLORATORY DATA ANALYSIS mg/day of Vitamin C is supplied?
The ‘ToothGrowth’ data consists of 60 independent observa- RQ-03: Is length of tooth bigger, when 2 instead of 1
tions for the length of tooth in Guinea Pigs over two factors, mg/day of Vitamin C is supplied?
the supply method and the dose of Vitamin C. RQ-04: Is length of tooth bigger, when 1 instead of 0.5
mg/day of Vitamin C is supplied?
1.1 By Supply Method Figure 2
Vitamin C was supplied either as Orange Juice (OJ) or as Groups by Dosage Levels
Ascorbic Acid (VC). For the groups defined by the supply
method, 1 Research Question was formed:
2 mg/day

RQ-01: Is length of tooth bigger, when Vitamin C is sup-


plied as orange juice instead of ascorbic acid?
Dose of Vitamin C

Figure 1 1 mg/day
Tooth Length of Groups by Supply Method

0.5 mg/day
Orange Juice
Supply Method

10 20 30
Tooth Length

Ascorbic Acid Table 2: Statistics by Dose

Dose of Vitamin C n mean sd


0.5 mg/day 20 10.605 4.499763
10 20 30
Tooth Length 1 mg/day 20 19.735 4.415436
2 mg/day 20 26.100 3.774150

Table 1: Statistics by Supply Method


1.3 By Supply Method and Dose
Supply Method n mean sd For the groups defined by the interaction of supply method
and dose, 9 Research Questions were formed:
Ascorbic Acid 30 16.96333 8.266029
Orange Juice 30 20.66333 6.605561 For constant dose:
RQ-05: Is length of tooth different, when 2 mg/day of Vi-
SUPPLEMENTARY INFORMATION tamin C is supplied as orange juice instead of ascorbic acid?
Further details can be found at the GitHub repository: RQ-06: Is length of tooth bigger, when 1 mg/day of Vita-
[Link] min C is supplied as orange juice instead of ascorbic acid?
Statistical-Inference-Course-Project/tree/ RQ-07: Is length of tooth bigger, when 0.5 mg/day of Vi-
master/basic_inferential_data_analysis tamin C is supplied as orange juice instead of ascorbic acid?
For constant supply method: Hypothesis test for (RQ-04):
RQ-08: Is length of tooth bigger, when 2 instead of 0.5 [H0 :µ1mg ≤ µ0.5mg ] VS [Ha :µ1mg > µ0.5mg ]
mg/day of Vitamin C is supplied as orange juice?
RQ-09: Is length of tooth bigger, when 2 instead of 1 Hypothesis test for (RQ-05):
mg/day of Vitamin C is supplied as orange juice? [H0 :µOJ:2mg = µV C:2mg ] VS [Ha :µOJ:2mg 6= µV C:2mg ]
RQ-10: Is length of tooth bigger, when 1 instead of 0.5
mg/day of Vitamin C is supplied as orange juice? Hypothesis test for (RQ-06):
RQ-11: Is length of tooth bigger, when 2 instead of 0.5 [H0 :µOJ:1mg ≤ µV C:1mg ] VS [Ha :µOJ:1mg > µV C:1mg ]
mg/day of Vitamin C is supplied as ascorbic acid?
RQ-12: Is length of tooth bigger, when 2 instead of 1 Hypothesis test for (RQ-07):
mg/day of Vitamin C is supplied as ascorbic acid? [H0 :µOJ:0.5mg ≤ µV C:0.5mg ] VS [Ha :µOJ:0.5mg > µV C:0.5mg ]
RQ-13: Is length of tooth bigger, when 1 instead of 0.5
mg/day of Vitamin C is supplied as ascorbic acid? Hypothesis test for (RQ-08):
[H0 :µOJ:2mg ≤ µOJ:0.5mg ] VS [Ha :µOJ:2mg > µOJ:0.5mg ]
Figure 3
Tooth Length of Groups by Supply Method and Dose
Hypothesis test for (RQ-09):
2 mg/day of Vitamin C [H0 :µOJ:2mg ≤ µOJ:1mg ] VS [Ha :µOJ:2mg > µOJ:1mg ]
as Orange Juice

2 mg/day of Vitamin C Hypothesis test for (RQ-10):


Supply Method and Dose

as Ascorbic Acid
[H0 :µOJ:1mg ≤ µOJ:0.5mg ] VS [Ha :µOJ:1mg > µOJ:0.5mg ]
Interactions of

1 mg/day of Vitamin C
as Orange Juice
Hypothesis test for (RQ-11):
1 mg/day of Vitamin C [H0 :µV C:2mg ≤ µV C:0.5mg ] VS [Ha :µV C:2mg > µV C:0.5mg ]
as Ascorbic Acid

0.5 mg/day of Vitamin C Hypothesis test for (RQ-12):


as Orange Juice
[H0 :µV C:2mg ≤ µV C:1mg ] VS [Ha :µV C:2mg > µV C:1mg ]
0.5 mg/day of Vitamin C
as Ascorbic Acid
Hypothesis test for (RQ-13):
10 20 30 [H0 :µV C:1mg ≤ µV C:0.5mg ] VS [Ha :µV C:1mg > µV C:0.5mg ]
Tooth Length

Under the assumption (A1), for the hypothesis above, 13


Welch two sample t-tests were conducted.
Table 3: Statistics by Supply Method and Dose

Interactions of Supply Method and Dose n mean sd 2.3 Adjust p-values


0.5 mg/day of Vitamin C as Ascorbic Acid 10 7.98 2.746634
The p-values that were originally obtained, were adjusted
0.5 mg/day of Vitamin C as Orange Juice 10 13.23 4.459708 (to compensate for the multiple tests) by the Benjamini–
1 mg/day of Vitamin C as Ascorbic Acid 10 16.77 2.515309 Hochberg procedure so that the False Discovery Rate (FDR)
1 mg/day of Vitamin C as Orange Juice 10 22.70 3.910953 was bounded to be at most 0.05.
2 mg/day of Vitamin C as Ascorbic Acid 10 26.14 4.797731
2 mg/day of Vitamin C as Orange Juice 10 26.06 2.655058
2.4 Results
For all hypothesis tests, except one for the (RQ-05), there
were enough evidence to reject the NULL hypothesis H0 in
2. STATISTICAL ANALYSIS favor of the alternative Ha .
2.1 Assumptions
A major assumption was made, to restrict the methodology
only in the approaches that had been discussed in the course: Table 4: Results

(A1): The length of tooth for all groups, follows Normal RQ x y p adj is sig
distribution with unknown expected value and variance. 01 OJ VC 0.0328437 Yes
02 2mg 0.5mg 0.0000000 Yes
2.2 Multiple t-tests 03 2mg 1mg 0.0000207 Yes
Each of the Research Questions, was appropriately trans- 04 1mg 0.5mg 0.0000003 Yes
lated into a statistical hypothesis test: 05 OJ:2mg VC:2mg 0.9638516 No
06 OJ:1mg VC:1mg 0.0007499 Yes
Hypothesis test for (RQ-01): 07 OJ:0.5mg VC:0.5mg 0.0041331 Yes
[H0 :µOJ ≤ µV C ] VS [Ha :µOJ > µV C ] 08 OJ:2mg OJ:0.5mg 0.0000017 Yes
09 OJ:2mg OJ:1mg 0.0231608 Yes
Hypothesis test for (RQ-02): 10 OJ:1mg OJ:0.5mg 0.0000744 Yes
[H0 :µ2mg ≤ µ0.5mg ] VS [Ha :µ2mg > µ0.5mg ]
11 VC:2mg VC:0.5mg 0.0000002 Yes
12 VC:2mg VC:1mg 0.0000744 Yes
Hypothesis test for (RQ-03):
13 VC:1mg VC:0.5mg 0.0000011 Yes
[H0 :µ2mg ≤ µ1mg ] VS [Ha :µ2mg > µ1mg ]
3. CONCLUSIONS RStudio Team (2020). RStudio: Integrated Development for
From the results of the Statistical Analysis (displayed in R. RStudio, PBC, Boston, MA URL [Link]
Table 4), the following conclusions were drawn for the 13 com/.
Research Questions of interest:
Wickham et al., (2019). Welcome to the tidyverse. Journal
For (RQ-01), the data provides substantial evidence (p(RQ−01) < of Open Source Software, 4(43), 1686, [Link]
0.0329) to reject the NULL hypothesis H0 in favor of the al- 10.21105/joss.01686
ternative Ha , according to which the expected tooth length
is bigger when Vitamin C is supplied as orange juice instead Hao Zhu (2019). kableExtra: Construct Complex Table
of ascorbic acid. with ‘kable’ and Pipe Syntax. R package version 1.1.0.
[Link]
For (RQ-02), (RQ 03) and (RQ-04), the data provides sub-
stantial evidence (p(RQ−02) < 0.0001, p(RQ−03) < 0.0001 JJ Allaire and Yihui Xie and Jonathan McPherson and Javier
and p(RQ−04) < 0.0001 respectively) to reject the NULL hy- Luraschi and Kevin Ushey and Aron Atkins and Hadley
pothesis H0 in favor of the alternatives Ha , according to Wickham and Joe Cheng and Winston Chang and Richard
which the expected tooth length is bigger when the dose of Iannone (2020). rmarkdown: Dynamic Documents for R.
Vitamin C is 2 instead of 0.5 mg/day, 2 instead of 1 mg/day R package version 2.3. URL [Link]
and 1 instead of 0.5 mg/day respectively (independently of com.
the supply method).
Yihui Xie and J.J. Allaire and Garrett Grolemund (2018). R
For (RQ-05), the data does NOT provide substantial evi- Markdown: The Definitive Guide. Chapman and Hall/CRC.
dence (p(RQ−05) > 0.9638) to reject the NULL hypothesis ISBN 9781138359338. URL [Link]
H0 , according to which the expected tooth length is the same rmarkdown.
when dose of 2 mg/day of Vitamin C is supplied either as
orange juice or as ascorbic acid. Yihui Xie (2020). knitr: A General-Purpose Package for
Dynamic Report Generation in R. R package version 1.28.
For (RQ-06) and (RQ-07), the data provides substantial ev-
idence (p(RQ−06) < 0.0008 and p(RQ−07) < 0.0042 respec- Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd
tively) to reject the NULL hypothesis H0 in favor of the al- edition. Chapman and Hall/CRC. ISBN 978-1498716963
ternative Ha , according to which the expected tooth length
is bigger when dose of either 1 or 0.5 mg/day of Vitamin C Yihui Xie (2014) knitr: A Comprehensive Tool for Repro-
is supplied as orange juice instead of ascorbic acid. ducible Research in R. In Victoria Stodden, Friedrich Leisch
and Roger D. Peng, editors, Implementing Reproducible
For (RQ-08), (RQ 09) and (RQ-10), the data provides sub- Computational Research. Chapman and Hall/CRC. ISBN
stantial evidence (p(RQ−08) < 0.0001, p(RQ−09) < 0.02317 978-1466561595
and p(RQ−10) < 0.0001 respectively) to reject the NULL
hypothesis H0 in favor of the alternative Ha , according to JJ Allaire, Yihui Xie, R Foundation, Hadley Wickham, Jour-
which the expected tooth growth is bigger when Vitamin C nal of Statistical Software, Ramnath Vaidyanathan, Asso-
is supplied as orange juice in dose of 2 instead of 0.5 mg/day, ciation for Computing Machinery, Carl Boettiger, Elsevier,
2 instead of 1 mg/day and 1 instead of 0.5 mg/day respec- Karl Broman, Kirill Mueller, Bastiaan Quast, Randall Pruim,
tively. Ben Marwick, Charlotte Wickham, Oliver Keyes, Miao Yu,
Daniel Emaasit, Thierry Onkelinx, Alessandro Gasparini,
For (RQ-11), (RQ 12) and (RQ-13), the data provides sub- Marc-Andre Desautels, Dominik Leutnant, MDPI, Taylor
stantial evidence (p(RQ−11) < 0.0001, p(RQ−12) < 0.0001 and Francis, Oğuzhan Öğreden, Dalton Hance, Daniel Nüst,
and p(RQ−13) < 0.0001 respectively) to reject the NULL Petter Uvesten, Elio Campitelli, John Muschelli, Zhian N.
hypothesis H0 in favor of the alternative Ha , according to Kamvar, Noam Ross, Robrecht Cannoodt, Duncan Luguern
which the expected tooth growth is bigger when Vitamin and David M. Kaplan (2020). rticles: Article Formats for R
C is supplied as ascorbic acid in dose of 2 instead of 0.5 Markdown. R package version 0.14. [Link]
mg/day, 2 instead of 1 mg/day and 1 instead of 0.5 mg/day org/package=rticles
respectively.

4. REFERENCES
Caffo, B. (2016). Statistical inference for data science. Re-
trieved from [Link]

Benjamini, Y., and Hochberg, Y. (1995). Controlling the


false discovery rate: a practical and powerful approach to
multiple testing. Journal of the Royal Statistical Society Se-
ries B, 57, 289–300. [Link]

R Core Team (2020). R: A language and environment for


statistical computing. R Foundation for Statistical Comput-
ing, Vienna, Austria. URL [Link]
5. APPENDIX 5.3 Exploratory Data Analysis
All the code that were used for this assignment has been Descriptive statistics were examined through figures and ta-
included in the APPENDIX. bles in order to identify some useful aspects of the data that
helped to form the Research Questions.
5.1 Load The Required Libraries
5.3.1 Code for Figure 1
library(tidyverse) The following code was used to create Figure 1:
library(kableExtra)
figure_1 <- ggplot(
5.2 Data Processing filter(tooth_growth, factor == "supp"),
aes(x = length, y = group)
Minor data processing was conducted to set the data table
) +
in an appropriate format for the needs of this assignment.
geom_boxplot() +
tooth_growth <- bind_rows( labs(
ToothGrowth %>% title = "Figure 1",
transmute( subtitle =
"factor" = "supp", "Tooth Length of Groups by Supply Method",
"group_abbr" = [Link](supp), x = "Tooth Length",
"group" = str_replace_all( y = "Supply Method"
string = group_abbr, ) +
c("OJ" = "Orange Juice", theme_bw() +
"VC" = "Ascorbic Acid")), theme([Link] = element_text(hjust = 0.5))
"length" = len
),
5.3.2 Code for Table 1
ToothGrowth %>%
The following code was used to create Table 1:
transmute(
"factor" = "dose",
table_1 <- kable(
"group_abbr" = paste0(dose, "mg"),
x = tooth_growth %>%
"group" = str_replace_all(
filter(factor == "supp") %>%
string = group_abbr,
group_by(group) %>%
c("0.5mg" = "0.5 mg/day",
summarise("n" = n(),
"1mg" = "1 mg/day",
"mean" = mean(length),
"2mg" = "2 mg/day")),
"sd" = sd(length)
"length" = len
) %>%
),
rename("Supply Method" = group),
ToothGrowth %>%
booktabs = TRUE,
transmute(
caption =
"factor" = "supp_and_dose",
"Statistics by Supply Method"
"group_abbr" = paste0(supp, ":", dose, "mg"),
) %>%
"group" = str_replace_all(
kable_styling(
string = group_abbr,
latex_options = c("striped", "HOLD_position")
c("OJ:0.5mg" = paste0(
)
"0.5 mg/day of Vitamin C", "\n",
"as Orange Juice"),
"OJ:1mg" = paste0( 5.3.3 Code for Figure 2
"1 mg/day of Vitamin C", "\n", The following code was used to create Figure 2:
"as Orange Juice"),
"OJ:2mg" = paste0( figure_2 <- ggplot(
"2 mg/day of Vitamin C", "\n", filter(tooth_growth, factor == "dose"),
"as Orange Juice"), aes(x = length, y = group)
"VC:0.5mg" = paste0( ) +
"0.5 mg/day of Vitamin C", "\n", geom_boxplot() +
"as Ascorbic Acid"), labs(
"VC:1mg" = paste0( title = "Figure 2",
"1 mg/day of Vitamin C", "\n", subtitle = "Groups by Dosage Levels",
"as Ascorbic Acid"), x = "Tooth Length",
"VC:2mg" = paste0( y = "Dose of Vitamin C"
"2 mg/day of Vitamin C", "\n", ) +
"as Ascorbic Acid"))), theme_bw() +
"length" = len theme([Link] = element_text(hjust = 0.5))
)
)
5.3.4 Code for Table 2 5.4 Statistical Analysis
The following code was used to create Table 2: 5.4.1 Multiple t tests
A list was created with pairs of abbreviated labels, for the
table_2 <- kable( groups that should be compared.
x = tooth_growth %>%
filter(factor == "dose") %>% group_comparisons <- list(
group_by(group) %>% "RQ-01" = c("OJ","VC"),
summarise("n" = n(), "RQ-02" = c("2mg","0.5mg"),
"mean" = mean(length), "RQ-03" = c("2mg","1mg"),
"sd" = sd(length) "RQ-04" = c("1mg","0.5mg"),
) %>% "RQ-05" = c("OJ:2mg","VC:2mg"),
rename("Dose of Vitamin C" = group), "RQ-06" = c("OJ:1mg","VC:1mg"),
booktabs = TRUE, "RQ-07" = c("OJ:0.5mg","VC:0.5mg"),
caption = "Statistics by Dose" "RQ-08" = c("OJ:2mg","OJ:0.5mg"),
) %>% "RQ-09" = c("OJ:2mg","OJ:1mg"),
kable_styling( "RQ-10" = c("OJ:1mg","OJ:0.5mg"),
latex_options = c("striped", "hold_position") "RQ-11" = c("VC:2mg","VC:0.5mg"),
) "RQ-12" = c("VC:2mg","VC:1mg"),
"RQ-13" = c("VC:1mg","VC:0.5mg")
5.3.5 Code for Figure 3 )
The following code was used to create Figure 3:

figure_3 <- ggplot( Based on the list with the group comparisons the 13 Welch
filter(tooth_growth, factor == "supp_and_dose"), t-test were conducted. Notice that for the hypothesis test
aes(x = length, y = group) of (RQ-05) the alternative hypothesis is two sided.
) +
geom_boxplot([Link] = FALSE) + multiple_t_tests <- Map(
labs( f = function(groups, Ha) {
title = "Figure 3", [Link](
subtitle = paste0("Tooth Length of Groups ", x = tooth_growth %>%
"by Supply Method and Dose"), filter(group_abbr == groups[[1]]) %>%
x = "Tooth Length", select(length) %>%
y = "Interactions of \nSupply Method and Dose" ‘[[‘(1),
) + y = tooth_growth %>%
theme_bw() + filter(group_abbr == groups[[2]]) %>%
theme([Link] = element_text(hjust = 0.5)) select(length) %>%
‘[[‘(1),
alternative = Ha
5.3.6 Code for Table 3 )
The following code was used to create Table 3: },
"groups" = group_comparisons,
table_3 <- kable( "Ha" = c(
x = tooth_growth %>% rep("greater", 4),
filter(factor == "supp_and_dose") %>% # Only for the hypothesis test of (RQ-05)
mutate(group = str_replace(group, "\n", " ") # a ’[Link]’ test was conducted.
) %>% "[Link]",
group_by(group) %>% rep("greater", 8)
summarise("n" = n(), )
"mean" = mean(length), )
"sd" = sd(length)
) %>%
rename(
5.4.2 Adjust p-values
"Interactions of Supply Method and Dose" = The following code was used to adjust the original p-values
group obtained for the multiple tests:
),
adjusted_p_values <- [Link](
booktabs = TRUE,
p = map_dbl(.x = multiple_t_tests,
caption = "Statistics by Supply Method and Dose"
.f = ~.x[["[Link]"]]),
) %>%
method = "fdr"
kable_styling(
)
latex_options = c("striped", "HOLD_position",
"scale_down")
)
5.4.3 Results
From the list with the results of the multiple Welch t-tests,
a tibble was created with the values of the adjusted p-values
for each of 13 comparisons.

results <- tibble(


"RQ" = str_pad(1:13, width = 2, pad = "0"),
"x" = map_chr(group_comparisons, ~.x[[1]]),
"y" = map_chr(group_comparisons, ~.x[[2]]),
"p_adj" = adjusted_p_values,
"is_sig" = ifelse(p_adj < 0.05, "Yes", "No")
)

Code for Table 4

The following code was used to create Table 4:

table_4 <- kable(


x = results,
caption = "Results",
booktabs = TRUE,
) %>%
kable_styling(
latex_options = c("striped", "HOLD_position")
)

You might also like