CH 2
CH 2
1
Properties of best estimator
Unbiased Estimator: An estimator whose expected value is the value of the parameter being
estimated.
Consistent Estimator: An estimator which gets closer to the value of the parameter as the
sample size increases.
Relatively Efficient Estimator: The estimator for a parameter with the smallest variance.
estimator is the mathematical way we compute the point estimate. For instance, sum of xi over n is the
x
.That is
i
point estimator used to compute the estimate of the population means, X is a point
n
estimator of the population mean.
Confidence interval estimation of the population mean: µ
Although X possesses nearly all the qualities of a good estimator, because of sampling error, we know
that it's not likely that our sample statistic will be equal to the population parameter, but instead will fall
into an interval of values. We will have to be satisfied knowing that the statistic is "close to" the
parameter.
Case 1:
Consider samples of size n drawn from a population, whose mean is and standard deviation is with
replacement and order important. The population can have any frequency distribution. The sampling
distribution of X will have a mean x and a standard deviation x , and approaches a normal
n
2
distribution as n gets large. This allows us to use the normal distribution curve for computing
confidence intervals.
X
Z has a normal distributi on with mean 0 and var iance 1
n
X Z n
X , where is a measure of error . (Margin of error)
Z n
- For the interval estimator to be good the standard error ( ) should be small. How it is small?
By making n large
Small variability
Taking Z small
- To obtain the value of Z, we have to attach this to a theory of chance. That is, there is an area of size
1 such
P ( Z 2 Z Z 2 ) 1
Where is the probability that the parameterlies outsidethe int erval
Z 2 s tan ds for the s tan dard normal var iableto the right of which
2 probability lies, i.e P( Z Z 2 ) 2
X
P( Z 2 Z 2 ) 1
n
P( X Z 2 n X Z 2 n) 1
But usually 2 is not known, in that case we estimate by its point estimator S2
3
Here are the z values corresponding to the most commonly used confidence levels.
100 (1 ) % 2 Z 2
Case 2:
X
t has t distribution with n 1 deg rees of freedom.
S n
The unit of measurement of the confidence interval is the standard error. This is just the standard
deviation of the sampling distribution of the statistic.
Examples:
1. From a normal sample of size 25 trainers having the average heart rate was found to be 32 and given
that the population standard deviation is 4.2. Find
a) A 95% confidence interval for the population mean.
b) A 99% confidence interval for the population mean.
Solution:
4
b) Therefore, we are 95% confident that the average heart rate of the population was fall in
b/n 30.35 and 33.65.
c)
We are 99% confident that the average heart rate of population was found in b/n 29.83 and
34.17.
2. A drug company is testing a new drug which is supposed to reduce blood pressure. From the six
people who are used as subjects, it is found that the average drop in blood pressure is 2.28 points,
with a standard deviation of .95 points. What is the 95% confidence interval for the mean change in
pressure?
Solution: (exercise)
The procedure to find the confidence interval, the sample size, the error bound, and the confidence level
for a proportion is similar to that for the population mean, but the formulas are different. How do you
know you are dealing with a proportion problem? There is no mention of a mean or average!
To form a proportion, take X, the random variable for the number of successes and divide it by n, the
number of trials (or the sample size). The random variable ̂ (read "P hat") is that proportion, ̂ = .
When n is large and p is not close to zero or one, or when n ̂ <5, we can use the normal distribution to
approximate the number of successes.
5
X∼N (np,√ ), If we divide the random variable, the mean, and the standard deviation by n , we get a
normal distribution of proportions with ̂ , called the estimated proportion, as the random variable.
(Recall that a proportion as the number of successes divided by n.)
= ̂ ∼N ( ,√ )
√
Using algebra to simplify =√
̂̂
The margin of error for a proportion is E= √ , Where ̂ =1− ̂
This formula is similar to the margin of error formula for a mean, except that the "appropriate standard
deviation" is different. For a mean, when the population standard deviation is known, the appropriate
standard deviation that we use is . For a proportion, the appropriate standard deviation is √
√
̂̂
However, in the margin error formula, we use√ as the standard deviation, instead of√
In the margin error formula, the sample proportions ̂ and ̂ are estimates of the unknown population
proportions p and q. The estimated proportions ̂ and ̂ are used because p and q are not known. The
sample proportions ̂ and ̂ are calculated from the data: ̂ is the estimated proportion of successes, and
̂ is the estimated proportion of failures.
̂̂
̂ √
6
Example 1: A random sample of 122 statistics students was asked: “Have you smoked a cigarette in the
past week?” 64 students reported smoking within the past week. Find a 90% confidence interval for the
true proportion of statistics students who smoke. (Round the answers to 4 decimal places)
Solution: (exercise)
Example 2: Out of a random sample of 69 freshmen at State University, 33 students have declared a
major. Find a 98% confidence interval for the true proportion of freshmen at State University who have
declared a major. Solution: (exercise)
Example 3: Suppose a mobile phone company wants to determine the current percentage of customers
aged 50+ who use text messaging on their cell phones. How many customers aged 50+ should the
company survey in order to be 90% confident that the estimated (sample) proportion is within three
percentage points of the true population proportion of customers aged 50+ who use text messaging on
their cell phones. Solution: (exercise)
Example 4: Suppose an internet marketing company wants to determine the current percentage of
customers who click on ads on their smartphones. How many customers should the company survey in
order to be 99% confident that the estimated proportion is within five percentage points of the true
population proportion of customers who click on ads on their smartphones? (Use Population
proportion, P=0.5)
7
The standard deviation of the sampling distribution of differences of means, also called the
standard error of differences of means is denoted by σ (̅ - ̅ ) .
σ (̅ - ̅ ) = √ where σx is the standard error of the mean of the first population and σy is the
2
standard error of the mean of the second population. ( ̅ = σ x /nx ; ̅= ; σ2y / ny )
The sampling distribution is normal if both populations are normal, and is approximately normal if
the samples are large enough (even if the populations aren’t normal). In practice, it is assumed that
the sampling distribution of differences of means is normal if both nx and ny are ≥30.
Then the (1- 100% C.I for the difference between the two population mean - is:
̅- ̅ √
Eg. If a random sample of 50 non-smokers has a mean life of 76 years with a standard deviation of 8
years, and a random sample of 65 smokers lives 68 years with a standard deviation of 9 years,
A) What is the point estimate for the difference of the population means?
B) Find a 95% C.I. for the difference of mean lifetime of non-smokers and smokers.
Solutions:
Population x (non-smokers) nx=50 , ̅ = 76, Sx = 8, σ2x = S2x / nx, = 64 /50 =1.28 years
Population y (smokers) ny=65 , ̅ = 68, Sy = 9, σ2y = S2y / ny, = 91 /65 =1.25 years
A) A point estimate for the difference of population means (μx- μy) = ̅ - ̅ =76-68 = 8 years
B) B) At a 95% confidence level,
Z =±1.96*σ(̅-̅)= √
=±1.96*√ √
Hence, 95% C.I. for μx- μy = (̅ - ̅ )) ± 1.96 σ(̅ - ̅ )= 8 ± 1.96* (1.59) = 8 ± 3.12 = (4.88 to 11.12
years)
8
Exercise An anthropologist who wanted to study the heights of adult men and women took a random
sample of 128 adult men and 100 adult women and found the following summary results.
By the same analogy, the C.I. for the difference of proportions (Px - Py) is given by the following
formula.
C.I. for Px-Py =(px-py) ± Z σ(Px-Py) . Where Z is determined by the confidence coefficient and
σ (Px - Py) = √
Example: Each of two groups consists of 100 patients who have leukaemia. A new drug is given to the
first group but not to the second (the control group). It is found that in the first group 75 people have
remission for 2 years; but only 60 in the second group. Find 95% confidence limits for the difference in
the proportion of all patients with leukaemia who have remission for 2 years.
px = .75, qx = .25, nx = 100, σ2Px = pxqx / nx = .75 x .25 / 100 = .001875 py = .60, qy = .40, ny = 100, σ2
Py = pyqy / ny = .60 x .40 / 100 = .0024
9
Hence, σ(Px-Py) = √ = √ = √ = 0.065
At a 95% Confidence level, Z = ± 1.96 and the difference of the two independent random samples is
(0.75 -0 .60) = 0.15. Therefore, a 95 % C. I. for the difference in the proportion with 2-year remission is
(0.15 ± 1.96 (0.065)) = (0.15 ±0 .13) = (0 .02 to 0.28).
10
Chapter Three
11
- Type I error: Rejecting the null hypothesis when it is true.
- Type II error: Failing to reject the null hypothesis when it is false.
NOTE:
1. There are errors that are prevalent in any two choice decision making problems.
2. There is always a possibility of committing one or the other errors.
3. Type I error ( ) and type II error ( ) have inverse relationship and therefore, can not be
minimized at the same time.
In practice we set at some value and design a test that minimize . This is because a type I
error is often considered to be more serious, and therefore more important to avoid, than a type II
error.
General steps in hypothesis testing:
1.The first step in hypothesis testing is to specify the null hypothesis (H0) and the alternative hypothesis
(H1).
2.The next step is to select a significance level,
3.Identify the sampling distribution of the estimator (t, Z, F, 2 )
4.The fourth step is to calculate a statistic analogous to the parameter specified by the null hypothesis.
5.Identify the critical region from table.
6.Making decision.
7.Summarization of the result (interpretation).
Suppose the assumed or hypothesized value of is denoted by 0 , then one can formulate two sided
1. H 0 : 0 vs H1 : 0
2. H 0 : 0 vs H1 : 0
3. H 0 : 0 vs H1 : 0
12
CASES:
X
Z
n
- After specifying we have the following regions (critical and acceptance) on the standard normal
distribution corresponding to the above three hypothesis.
H1 Reject H0 if Accept H0 if
0 Z cal Z 2 Z cal Z 2
0 Z cal Z Z cal Z
Where: Z cal X 0
n
Case 2: When sampling is from a normal distribution with 2 unknown and small sample size
- After specifying we have the following regions on the student t-distribution corresponding to the
above three hypothesis.
13
H1 Reject H0 if Accept H0 if
0 tcal t 2 tcal t 2
0 tcal t tcal t
Where: tcal X 0
S n
Case3: When sampling is from a non- normally distributed population or a population whose
functional form is unknown.
- If a sample size is large one can perform a test hypothesis about the mean by using:
X 0
Z cal , if 2 is k nown.
n
X 0
, if 2 is unk nown.
S n
14
The critical region is tcal t0.005 (9) 3.2498
(3.2498, 3.2498 ) is acceptan ce region.
Step 5: Computations:
X 10.06, S 0.25
X 0 10.06 10
tcal 0.76
S n 0.25 10
Step 6: Decision
Accept H0 , since tcal is in the acceptance region.
Step 7: Conclusion
At 1% level of significance, we have no evidence to say that the average height content of containers
of the given lubricant is different from 10 litters, based on the given sample data.
Example: The mean life time of a sample of 16 fluorescent light bulbs produced by a company is
computed to be 1570 hours. The population standard deviation is 120 hours. Suppose the hypothesized
value for the population mean is 1600 hours. Can we conclude that the life time of light bulbs is
decreasing?
(Use 0.05 and assume the normality of the population) (exercise!)
1. Write down the null and alternative hypotheses in terms of the population proportion p. Include
appropriate units with the values of the proportion.
2. Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-
tailed.
3. Example: Suppose the hypotheses for a hypothesis test are:
H0: p=20%
Ha: p>20%
Because the alternative hypothesis is a >, this is a right-tail test. The p-value is the area in the right-tail
of the distribution.
15
4. Collect the sample information for the test and identify the significance level.
5. Find the p-value (the area in the corresponding tail) for the test using the appropriate distribution:
̂
If n×p ≥5 and n×(1−p)≥5 , use the normal distribution with z = √
Compare the p-value to the significance level and state the outcome of the test:
If p-value ≤α , reject H0 in favor of Ha.
o The results of the sample data are significant. There is sufficient evidence to conclude
that the null hypothesis H0 is an incorrect belief and that the alternative hypothesis Ha is
most likely correct.
Example 1: Joon believes that 50% of first-time brides in the United States are younger than their
grooms. She performs a hypothesis test to determine if the percentage is the same or different from 50%.
Joon samples 100 first-time brides and 53 reply that they are younger than their grooms. For the
hypothesis test, she uses a 1% level of significance. (Exercise)
Example 2: A teacher believes that 85% of students in the class will want to go on a field trip to the
local zoo. She performs a hypothesis test to determine if the percentage is the same or different from
85%. The teacher samples 50 students and 39 replies that they would want to go to the zoo. For the
hypothesis test, use a 1% level of significance. (Exercise)
16
Chapter Four
Chi-Square Tests
B
A B1 B2 . . Bj . Bc Total
A1 O11 O12 O1j O1c R1
A2 O21 O22 O2j O2c R2
.
.
Ai Oi1 Oi2 Oij Oic Ri
.
.
Ar Or1 Or2 Orj Orc
Total C1 C2 Cj n
- The chi-square procedure test is used to test the hypothesis of independency of two attributes .For
instance we may be interested
Whether the presence or absence of hypertension is independent of smoking habit
or not.
17
Whether the size of the family is independent of the level of education attained by
the mothers.
Whether there is association between father and son regarding boldness.
Whether there is association between stability of marriage and period of
acquaintance ship prior to marriage.
i 1 j 1 eij
Where Oij the numberof units that belongto categoryi of A and j of B.
eij Expected frequencythat belongto categoryi of A and j of B.
- Reject H0 for independency at level of significance if the calculated value of 2 exceeds the
tabulated value with degree of freedom equal to (r 1)(c 1) .
18
(Oij eij ) 2
2 ( r 1)(c 1) at
r c
Reject H 0 if 2 cal
i 1 j 1
eij
Examples:
1. A geneticist took a random sample of 300 men to study whether there is association between father
and son regarding boldness. He obtained the following results.
Son
Father Bold Not
Bold 85 59
Not 65 91
Using 5% test whether there is association between father and son regarding boldness.
Solution:
H 0 : Thereis no association between Father and Son regardingboldness.
H1 : not H 0
e11 R1 * C1
144 *150
72
n 300
R1 * C2 144 *150
e12 72
n 300
R2 * C1 156 *150
e21 78
n 300
R2 * C2 156 *150
e22 78
n 300
- Obtain the calculated value of the chi-square.
19
2 (Oij eij ) 2
2
cal
2
i 1 j 1 eij
(85 72) 2 (59 72) 2 (65 78) 2 (91 78) 2
9.028
72 72 78 78
Test the hypothesis that the size of the family is independent of the level of education attained by
fathers. (Use 5% level of significance) (exercise)
20