0% found this document useful (0 votes)
10 views3 pages

2023midterm Code

The document outlines a statistical analysis involving body height data for males and females, calculating t-statistics and p-values to compare means. It also includes regression analysis to predict height based on gender and foot length, along with confidence intervals for predictions. Additionally, it presents an ANOVA analysis comparing multiple groups of height data, calculating sums of squares, mean squares, F-statistics, and p-values for significance testing.

Uploaded by

Claire Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

2023midterm Code

The document outlines a statistical analysis involving body height data for males and females, calculating t-statistics and p-values to compare means. It also includes regression analysis to predict height based on gender and foot length, along with confidence intervals for predictions. Additionally, it presents an ANOVA analysis comparing multiple groups of height data, calculating sums of squares, mean squares, F-statistics, and p-values for significance testing.

Uploaded by

Claire Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

#3

#(1)
xx = read.csv("BodyHeight.txt")
height.m = c(178, 180, 173, 191, 163, 175, 185, 179, 173, 173, 170, 172, 185, 172, 190, 185)
height.f = c(155, 165, 160, 169, 168, 162, 161, 170, 158, 164, 168, 165, 165, 162, 172, 170, 165,
168, 165)

allSample = c(height.m, height.f)


dObs = mean(height.m) - mean(height.f)

n_m = length(height.m)

n_f = length(height.f)

s2 = ((n_m - 1) * var(height.m) + (n_f -1) * var(height.f))/ (n_m+n_f-2)

t_statistic = dObs / sqrt((1/n_m + 1/n_f)*s2)

pvalue = 1 - pt(t_statistic, 33)

#(b)
dd = rep(0,10000)
for(i in 1:10000) {
ind = sample(1:35, 16)
dd[i] = mean(allSample[ind]-allSample[-ind])
}

pvalue = mean(dObs < dd)+ 0.5*mean(abs(dObs) == abs(dd))


pvalue

#4
# (a)

#(b)
X<-cbind(1,xx$gender,xx$foot.length)
Y=xx$height

beta = solve(t(X)%*%X)%*%t(X)%*%Y

# (c)
pred = beta[1] - beta[2] +25*beta[3]
pred

# (d)
r = Y - X%*%beta
s_2 = sum(r^2)/(35-2-1)

#(e)
var.beta=s_2 *(solve((t(X)%*%X)))
var.beta

# (f)
x = matrix(c(1, -1, 25), nrow=1)
m = x%*%beta

#*standard error of beta*


s = sqrt(sum(r^2)/(35-2-1))
std_beta = s * sqrt(x%*%solve((t(X)%*%X), t(x)))

# (g)
#*confidence interval for predicted y*
lower_value = m + qt(0.025, 32) * std_beta
upper_value = m + qt(0.975, 32) * std_beta
lower_value
upper_value
#5
#(a)

y1 = c(155, 165, 169, 168, 162, 161, 158, 164, 165, 165)
y2 = c(170, 168, 172, 170, 168)
y3 = c(162, 165, 168)
y4 = c(191, 175, 172, 172)
y5 = c(178, 180, 185, 179, 173, 170, 185, 190, 185)

yy = rbind(y1, y2, y3, y4, y5)


yybar = rowMeans(yy)
yybar

#b
SStrt=10*(mean(y1)-mean(yy))^2+5*(mean(y2)-mean(yy))^2+3*(mean(y3)-
mean(yy))^2+4*(mean(y4)-mean(yy))^2+9*(mean(y5)-mean(yy))^2
SStrt

#c
SSerr=(10-1)*var(y1)+(5-1)*var(y2)+(3-1)*var(y3)+(4-1)*var(y4)+(9-1)*var(y5)
SSerr

N = 10+5+3+4+9
#d

SStot = SStrt + SSerr


SStot

MStrt = SStrt / 4
MSerr = SSerr / 26

F = MStrt / MSerr
F

#e
p_value = 1 - pf(14.4340, 4, 26)
p_value

#f
#*1vs2*
(mean(y1)-mean(y2))-(1/sqrt(2))*qtukey(0.90,5,26)*sqrt(MSerr*(1/10+1/5))
(mean(y1)-mean(y2))+(1/sqrt(2))*qtukey(0.90,5,26)*sqrt(MSerr*(1/10+1/5))
#*1vs3*
(mean(y1)-mean(y3))-(1/sqrt(2))*qtukey(0.90,5,26)*sqrt(MSerr*(1/10+1/3))
(mean(y1)-mean(y3))+(1/sqrt(2))*qtukey(0.90,5,26)*sqrt(MSerr*(1/10+1/3))
#*2vs3*
(mean(y2)-mean(y3))-(1/sqrt(2))*qtukey(0.90,5,26)*sqrt(MSerr*(1/5+1/3))
(mean(y2)-mean(y3))+(1/sqrt(2))*qtukey(0.90,5,26)*sqrt(MSerr*(1/5+1/3))

You might also like