Statistical Inference
AKSU CODE -STA 22
ation and Sample - Random sampling
n (Point and Interval estimation
f goodness of fit and indeps
nd Two ~way analysis of variance anPoputatio
ad Sasnple
whole or the entire members of the class or group, For
\ 221 are to be used to estimate the mean age of the class, ‘When the entire’
10, 2 Sample. Sample involves a proportion ofthe entre population, For example, apse BBE
estimate th of all the students offering STA 221. When a sample is carried oul, We oul
n sampling (ats) i one of the sample schemes or sampling methods used in seleeingl
ation of size N, In this ease, all the samples have equal probability of being selected
ible samples have equal chance of being selected with
1 from a poy
MC, random sampling where each of the po
[We have simple random
(grswor),
tmmpling with replacement (srswr) and simple random sampling
. 1 Estimation of Parameters.
i
@ y 2 Zyi;n= number of sample or sample size,
(ii) ¥ = population mean = $2Yi
*
2. Varian
Var() = Variance of sample mean and is approximated by
2
y.
” Co.
n
= 1)? when u the population mean is known,
Jariance => E(Y;- y).
the sample mean (7) isCon: the wei 4 certain maternity clinic im Uyo,
Akwa Ibom Stat ——
Us Baby)
Y¥, (Weight)
4 with (i) replacement
‘The population mean and varia
‘The sample mean and variance
apsaevaaea7
29 2 2 3.65,
(The population mean= =i zy, = ;
environment: 4
4 mombers (weights) In A(1-4); Click: Formula; Click : More Functions, Click : StatitealyIv epee, we have
pa OD
is
pA be
“vial the esa and sing Bao environment forthe computation, weave
Ampling Distribution of the 9
oui) | EOD
(sun of unit)val square errors using Xcel em
Ne sample mean and variance,
Sample
mean L(y, -F)
23
oi
°
[ooaas
Again, ay = BG) = 3.65
‘The MSE of Fis §%, = 0.01083@
(ii) Also s¥,Home Work: Assume
2.00, N6.00, N8.00, N10,00, Ni
eamings of the population
without replacement, Esti
of size2.
13 SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION
Consider a box containing 3 red and $ blue balls. The 8 b
tute @ popu
ie population proportion) is 2 = 0.375. Suppose we now select sample of size n= 4
umber of possible sample
oe
44
; ee ‘
In drawing sample of size n from the parent population, red balls can be selected into the sample as
iG red, 1 blue)
ii Qred,2 blue)
iii. (1 red , 3 blue)
iv. (Ored , 4 blue)
Let us determine the number of different ways in which these various outcomes can OcoUr,
is G red, 1 blue) can occur in *C,.5
i” @red,2 blue) can ovcurin °C,
iti, (red ;3 blue) can occur in °C,
iv, (Ored’, 4 blue) can oceur in *C
Their probability of sel be
i Gred, 1 blue) can occur in
fi, (red, 2blue) can Secur in ==
30
iti - =
= j
iv. =
0 ‘
Let P be the proportion of red ball in sample.
i. For3redandi blue, P = = =0.75
rel
ii, For2redand2 blue, P = += 0.5
For Iredand3 blue, P = 1 =o 5
aWe will now transform the P values into Eq 2.3, we have
for the wo values of P; Pi=0,35 and Pa 0.42
This wil give us,
0.35~ 0.40 0.42 0.40 spi
> Reba enact, = 0.83
2 0.024 P0024
‘The probability we are looking for = P(0.35
S =81. similarly,
$,=6 = St =36ingency test in which we apply the Chi-square distribution 8
— 6 the significance of the dite wo or more indepe
jed and presented in an rc (fow and column) contingency
Opinion is independent of Uni
Opinion is not independent of Un
expected frequencies for the ijth c
nn23
5.3 Test of goodness of fit
“The goodness of fit test is an example ofthe application ofthe Chi-square (*) distribution. This set seeks to determine
if particular population has a specified theoretic distribution such as binomial, poison and the normal distribution.
I is based on how good a fit we have between actually observed frequencies of the sample data and the theoretical
‘frequencies obtained from a hypothesis distribution, For exemple, in 180 tosses of 4 coins , we obtained the following,
distribution of the occurrence of the head.
[Number oF | Observed
beads Srequencies
ase Sees 20
H co
re 0
3 46
a Ts E
We use the binomial statistic to calculate the population of obtaining 0, 1,2,3, 4 heads forn=4 and P =0.5.
P (X= 0)= *C,(0.5) (0.5) = 0.0625
P(K=1)= °C,(0.5)'(0.57 = 0.25
PE=2)= ‘C,(0.5)'0.5)' =0.375
Pe *¢,(0:5) (0.5) = 0.025
P(X =4) = °C,(0.5)'(0.5) = 0.0625
Ito obtain the expected frequencies, we multiply each of the frequency by the number of observations, that is, 180,
thus we have
"Number ofeads | Observed fea (0) | Expected Freq (©) | (oc) | (o-e) |
| lesvegy |
EES a
T Fata 45 [eo [0.00
2 [60 as [ea [ose |
3 46 Gana 002 |
4 7 tu | 0.36 |
e [8.68 |
10 test the hypothesis that the binomial provides a good fit against the alternative that it does not, Its te‘The null hypothesis is thus accepte
‘that the coins are balanced.
Exercises.
1. In tossing a die 180 times, 2
i
Ee
Out
0 of occurrence
‘Sometimes
Usually
‘Always
At 5 % level of significance, wou!
independent?25
Lesson 6
‘One way and Two-way Analysis of variance
is of variance or Completely Randomized Design (CRD): If the experimental reatmes
are assigned to the experimental units such that each experimental unit has an equal chance of receiving a given
treatment, such an experiment is said to be Completely Randomized. We note tha inthis design, one factor with
+ different levels gives rise to one factor with t treatments, This is the reason why we call the design ome way
classification. Note also that CRD is an extension of test involving two means from two independent samples
62. Features of CRD:
> It gives a good result if the experimental units are homogenous. by homogeneity we mean that the
‘experimental response from 2 treatment does not depend on the unit to which it falls.
Each experimental unit has an equal chance of receiving a given treatment and this probebility is equal to
LUN G, the number of experimental unit), thus the design eliminates bias.
‘The Design is recommended when prior information is not available or is minimal,
6.3. Models for Completely Randomized Design
In the context of analysis of variance (ANOVA) and regression models, we consider two statistical models namely
the “Fixed” and “Random” effects models.
i Fixed Effects:
A fixed model effect is the model in which all the treatment levels are ptedetermined. For example, consider
four different forms of nitrogen fertilizer (¢-g. urea, ammonia, ammonium nitrate, and 30% urea: ammonium
nitrate solution) to be compared in a completely randomized design with five replications. The experime:
selected these four specific treatment levels for investigation, and has no interest in any other N-fertlizers in
the analysis. Ifthe experiment were to be repeated, exactly these same four forms of nitrogen fertilizer would
be used again. In other words, the experimenter's attention is fixed upon these four fertilizer treatments and
no other.
fi, Random Effects:
Is a model in which one of the treatments is a random variable. For example, suppose an experiment on
‘outbreak of cholera in Uyo is carried out, which consist of 10 wards. If we randomly select 5 wards from the
10 wards for this experiment, then the effects of these 5 wards in the experiment is random where as if we
consider the whole 10 wards, then the effects would have been fixed,
Mixed Effect:
‘This is the model in which one of the treatments is fixed and at least one of the treatments is ra
dom.
mptions:
To carry out any test, we make the following assumptions:
"i The random variables are normally distributed.
The random variables are independent.me = F).+ (r,-— 7)
we have
‘Summing up and squaring both
ya, = Y41°
Hp) Ee He, -¥)
DEG -YE-L+ Td -%
siya
Total Sum Between Tre
Of Squares (SSx) Sum OF Squares (Ssn)
: qT?
& HB = >= Correction Factor (CF)
cae
i SSr= S Si (%,-%)
fat jiThe model for a RCBD is given as
ee Pt ey, «
where
¥
jis the observation or response of the ih treatment i
LE isthe overall mean.
T; is the effect of treatment i
Pj stow ete of ck
é
‘jis the error associated with the observation Y;
= Correction Factor (CF)me seer (A - ) = BL - =
[Link] “SSr -8Su- $85
ANOVA TARLE:
‘Source at ‘Sum of Squares ] “Mean Squares
a
j
Flask war
rer v8,
Total meee lisse coos
Hypothesis. Ifthe treatment or block effect
determine which treatment or
the use of any of the known m
po See
or
wwe reject the hy
of the multiple comparison met
Example:
The following data indicates the amount of time (in howr ce resumption for
lasses in a university.
Week | Monday Tuesday ] Wednesday |
- 2 26 [25
3 3 a meee |s | Pae
3 26 29 33 3 ae)
26 28 er \30 =
significance whether they are equal,BS. =< hia - 2"
iat
3)
Per
SS, = (1-9) ShyAnd SSR = SST—SSE = 38.5 ~ 5.753 = 32.74
> Regression >Reare
Analysis of
Source DF SS
[Regression 1 32.747
sal Error 8 5.753
Total ¥ = 9 38.500
= 0.848032 R-Sq=85.1% RS
SSR =32.747, SSE = 5.753 and
[2.4 Coefficient of Determinat
is is given as;
SSR = SST - SSE