T-Distribution and F-Distribution Relations
T-Distribution and F-Distribution Relations
University of Delhi
nwjLFk ,oa lrr~ f'k{kk foHkkx
fnYyh fo'ofo|ky;
B.A.
B.A. (Hons.)
(Hons.) Economics
Political Science
Semester-II
Semester-I
DisciplineCourse Credits-4
Specific Course (DSC-6)
Course Credit-4
INTERMEDIATE STATISTICS
UNDERSTANDING FOR ECONOMICS
POLITICAL THEORY
(Department of Economics)
As As
perper
thethe
UGCF-2022
UGCF andand
National
National
Education
Education
Policy
Policy
2020
2020
Intermediate Statistics for Economics
Editorial Board
Prof. J. Khuntia, V.A.Rama Raju,
Vajala Ravi, Devender
Content Writers
Dr. Pooja Sharma, Taramati, Ashish Kumar Garg
Academic Coordinator
Deekshant Awasthi
Published by:
Department of Distance and Continuing Education under
the aegis of Campus of Open Learning/School of Open Learning,
University of Delhi, Delhi-110 007
Printed by:
School of Open Learning, University of Delhi
TABLE OF CONTENT
About Contributors
Contributor's Name Designation
Dr. Pooja Sharma Associate Professor, Daulat Ram College, University of Delhi
Taramati Guest Faculty, Kirori Mal College, University of Delhi
Ashish Kumar Garg Assistant Professor, Ramjas College, University of Delhi
LESSON 1
STRUCTURE
1|Page
5. To be able to differentiate between the sample space, events, sample points, and
random experiments.
6. To get familiarized with the technique of the Venn diagram, its usage in defining
events, types of events and
7. To understand the properties of probability and various operations to comprehend the
working of probabilities.
1.2 INTRODUCTION
This unit introduces the concept of ‘probability’ to the students. The phenomenon of
probability indicates the presence of randomness and the existence of some element of
uncertainty. Whenever we face a situation in which there is more than one possible outcome
that can occur, the concept of probability renders a technique for quantifying the chances or
likelihood associated with every possible outcome. There are several instances that involve
chances and thus the notion of probability is applicable. For example, in political elections,
based on exit polls it is plausible to predict that a certain political party could come into power.
By deploying a database of the previous days and considering various parameters such as
temperature, humidity, pressure, etc., the meteorologists use specific tools or techniques to
predict weather forecasts and determine that there are 60 out of 100 chances that it would rain
today.
Another example from day-to-day life is that ‘since it is supposed to rain tomorrow, it is very
likely I will use my raincoat when I go to work. Similarly, flipping a coin involves the
probability of getting either a head or a tail is 0.5 and playing with dice involves one out of six
chances that the required number will come. Thus, the concept of probability can be applied
to several interesting events.
Probability is a mathematical term and the study of probability as a branch of mathematics is
over 300 years. This chapter enables the students to understand and estimate the likelihood of
various possibilities of events and outcomes. Various elementary concepts used in
comprehending the concept of probability will be discussed and explained, such as Sample,
population, random experiments, Venn diagram, sample points, events, types of events etc.
The discipline of Statistics deals with organizing and summarizing data for drawing
conclusions based on the information collected in the form of data. An investigation or
experiment that results in a well-defined collection of objects, constitutes what is known as
‘Population’.
There can be several types of population. One study on a particular type of medicine will lead
to a collection of particular capsules during a specified period. Another investigation might
2|Page
Deductive Reasoning
Probability
Population Sample
Inferential Statistics
Inductive Reasoning
3|Page
getting a Head or Tail is an event. Planting a sapling is a Trial and whether it survives, or dies
is an Event. Sitting for an examination is a Trial and getting grades such as A, B, C, D, and E
are events.
Exhaustive Events: All possible outcomes of an experiment constitute collectively exhaustive
events. For example, tossing a coin result in two exhaustive cases which are Head and Tail.
Planting a sapling leads to two exhaustive cases which are Survival and Death. Sitting for an
examination where a student is awarded only 5 grades results in those many exhaustive
numbers of cases.
Favourable Events: All those outcomes of an experiment that lend themselves to the
objectives or favour of the experiments are favourable events. For example, a gambler betting
on an Ace in a game of cards where every draw of cards decides the winner or loses has 4
favourable events, and betting on a black card has 13+13 = 26 favourable events.
Mutually Exclusive Events: Events are said to be mutually exclusive if happening of one
event prevents the occurrence of other events at the same time. Such events are also referred
to as disjoint events since they have no element in common. For example, in athletics meet
involving 10 challengers if any one of them wins then the remaining 9 winning cannot happen
and hence are mutually exclusive. Similarly, in a toss of coin, occurrence of Head or Tail are
mutually exclusive.
Equally Likely Events: Two events are said to be equally likely if one of them is as likely to
happen as the other. For example, in tossing a fair coin once, the outcomes Head and Tail are
equally likely. In a throw of 6-faced dice, all the six numbers 1,2,3,4,5,6 are equally likely. If
a person suffers a minor heart attack, the death or survival outcomes are not equally likely.
Independent Events: If the happening of one event is not affected by the happening (or not
happening) of another event, such events are said to be independent. For example, successively
throwing a dart on the dartboard and getting a perfect score in every throw are independent
events. However, a person throwing the dart once, practicing, and then throws it for the second
time. The event of getting a perfect score in both throws is not independent.
Example: 1 Trial : Tossing of one fair coin
Events : Occurrence of Head, the occurrence of Tail.
Exhaustive events : Occurrence of Head
Mutually exclusive events: Head and Tail
Equally Likely Events: Head and Tail
5|Page
6|Page
A CASE STUDY
Consider another example of rolling a dice, The sample space for the random experiment of
rolling dice is given by the Sample space S = {1,2,3,4,5,6},
The number of elements in the sample space is 6, denoted by n (S) = 6,
Let event E be an event that reflects even numbers that appear on dice, as represented by
E= {2, 4, 6},
The number of elements in event E is 3, represented by n(E)=3
There are several varieties of events as described in the next section.
IN-TEXT QUESTIONS
1. Events are said to be _____________. if the occurrence of one event prevents the
occurrence of another event at the same time.
2. If event A represents an event that at least a head appears, and event B represents an
event that only the tail appears. Events A and B are equally likely True / False
3. In the occurrence of the event: {Head} in a single throw of the coin, the occurrence of
event {Tail} is disjoint. The two events are called
a) Mutually exhaustive b) Equally likely c) Both
4. In an experiment consisting of tossing two coins, if event A represents an Event that at
least a Head occurs and event B represents that at least a Tail occurs, then
a) Events A and B are equally likely (True/False)
b) Events A and B are mutually exclusive (True/False)
c) Events A and B together form an exhaustive set (True/False)
7|Page
b. The complement of the intersection of event A and B is equal to the union of the
complement of A and the complement of B.
(A ∩ B) ' = A' ∪ B'
The events can be represented by using the Venn diagram as shown in the diagrams below.
A B
8|Page
A∩ 𝐵
A A
B
IN-TEXT QUESTIONS
1. Consider an experiment in which each of the three vehicles taking a particular freeway
exit turns left (L) or right (R) at the end of the exit ramp. Outline the sample space and
events.
2. The two events E1 and E2 are mutually exclusive, where E1 is the event consisting of
numbers less than 3 and E2 is the event that consists of numbers greater than 4. (True/
False)
3. If the two events have some common elements, the two events are not ____________.
1.4 PROBABILITY
In the realm of random experiments, the key objective of the probability of any event A is to
assign a number P(A) to event A. This value P(A) is called the probability of event A which
gives a unique measure of the chances that the event will occur.
9|Page
In other words, the probability is the chance of happening or occurrence of an event such as it
might rain today, team X will probably win today, or I may win the lottery. Largely, probability
is a measure of uncertainty.
1.4.1 Classical Definition of Probability
It is also called a priori or mathematical definition of probability. The probabilities are derived
from purely deductive reasoning. This implies that one does not throw a coin to state that the
probability of obtaining a head, or a tail is ½. However, there are cases where possibilities that
arise cannot be regarded as equally likely. For example, the Probability of a recession next
year Probability of GDP value next year. Similarly, the possibility of whether it will rain, or
the outcome of an election is not equally likely.
If an experiment results in mutually exclusive and equally likely outcomes. If m outcomes are
favorable to event A and n is the total number of outcomes in the sample space, then
𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴
P(A) = , 𝑜𝑟
𝑛 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
In a single throw of a die, the total occurrences or sample space is n = 6. All are mutually
exclusive and equally likely.
1.4.2 Relative Definition of Probability (by Von Mises)
If a trial is repeated a large number of times under essentially homogeneous and identical
conditions, then the limiting value of the relative frequency which is the ratio of absolute
frequencies to the total number of occurrences is called the probability of happening of events.
𝑚
P(A) = lim
𝑛→∞ 𝑛
IN-TEXT QUESTIONS
6. In a toss of two coins simultaneously, the probability of getting exactly 2 heads P(E)
no. of possible outcomes / total outcomes
7. In the toss of 3 coins simultaneously, the probability of getting exactly two heads.
8. What is the probability of getting at least 1 head when two coins are tossed
simultaneously?
9. Prob of getting almost 2 tails when three coins are tossed simultaneously.
10 | P a g e
10. Probability of getting at least 2 heads when three coins are tossed simultaneously.
11. Probability of getting a greater number of tails than heads when three coins are tossed
simultaneously.
1.4.3 Axiomatic Definition of Probability
The axiomatic approach to probability was provided by Russian Mathematician A.N.
Kolmogorov and includes both the above definitions. In order to ensure that the probability
assignments of values P(A) for a particular event in the sample space S, is consistent with the
intuitive notion of probability, all assignments of values of probability P(A) must satisfy the
following properties or Axioms.
1. For any event A, the probability of event A, given by P(A) is non-positive P(A) ≥0. In
other words, the probability that event A will occur can either be zero or some positive
number. The probability of event A can never be negative.
The Axiom 1 reflects the intuitive notion that the chance of A occurring should be non-
negative and is known as the Axiom of non-negativity.
2. The probability of the entire sample space is 1, that is P(S) = 1. In other words, the
probability that the entire sample space will occur is 100 percent, which means it will
surely occur. This is known as the Axiom of Certainty.
The sample space by definition is the event that must occur when the experiment is
performed. The sample space S contains all possible outcomes, therefore the maximum
possible probability is assigned to sample space S.
3. If A1, A2, A3, ………... are the infinite collection of disjoints events, then
P (A1 ∪A2 ∪ A3, ……….) = ∑∞ 𝑖=1 𝑃(𝐴)
This indicates that the probability of the union of all disjoint events belonging to the sample
space sums the chances of all individual events.
The third Axiom formalizes the idea that if we wish the probability that at least one of a number
of events will occur, given that no two events can occur simultaneously, then the chance of at
least one occurring is the sum of the chances of the individual events. This is known as the
axiom of finite additivity.
4. The probability of an event always lies between 0 and 1.
0< = P(A) < = 1,
P(A) = 0 means event A will not occur.
P(A) = 1 means event A will occur certainly.
11 | P a g e
5. Let the ∅ be the null event. The event contained no outcomes whatsoever. This property
mainly reflects Axiom 3 indicating the finite collection of disjoint events.
Therefore, P (∅) = 0, the probability of a null event is zero.
6. If A, B, and C are mutually exclusive events, the probability that any one of them will
occur is equal to the sum of probabilities of either individual occurrence.
P(A+B+C+...........) = P (AUBUC…….) = P(A) +P(B)+P(C) +...............
7. If A, B, C …… are a mutually exclusive and collectively exhaustive set of events the
sum of the probability of their individual occurrences is 1. However, if A, B, C ……
are any events, they are said to be statistically independent if the probability of their
occurring together is equal to the product of their individual probabilities. P(A∩B∩C)
= Probability of events A, B, and C occurring together or jointly or simultaneously,
also referred to as Joint probability.
P(A), P(B), and P(C) are called unconditional marginal or individual probabilities.
8. If events A, B, and C …...are not mutually exclusive then,
P(A+B) or P(AUB) = P(A) +P(B) - P(A∩B)
Where P(AB) is the joint probability that the two events occur simultaneously, that is
P (A∩ 𝐵). However, if A and B are mutually exclusive then,
P(A∩B) = P (∅) = 0
For every event A, there is an event A', called as a complement of A
IN-TEXT QUESTIONS
b) A multiple of 3
c) A factor of 5
14. Two dice are thrown together, find the probability of getting
a) An even number on both
b) Sum as a perfect square
c) Different numbers on both
d) A total of at least 10
e) Sum as a multiple of 3
f) A multiple of 2 on one and a multiple of 3 on other
g) Sum as an even number OF PROBABILITY
1.5 SUMMARY
This lesson familiarized the students with the basic concepts of sample space and population
along with their significance. The notion of probability was introduced with help of random
experiments. Various applications of probability in real life are presented in the chapter.
Certain important concepts related to probability such as space, events, sample points, and
random experiments are described in the chapter. The basic difference between the sample,
population, sample points, and events have been emphasized. The types of events such as
disjoint events, mutually exhaustive, and exclusive events have been explained. Further, the
concept of the Venn diagram is also presented in the chapter. The notion of probability by
using classical and relative definition has been introduced. Later the properties of probabilities
are also discussed in the chapter.
1.6 GLOSSARY
5. Random Experiment: Any process of observation or measurement that has more than
one possible outcome and for which there is uncertainty about which outcome will
actually materialize. Such an experiment is referred to as ‘random experiment’.
6. Sample Point or Event: Each member or outcome of sample space or population is
called Sample Point. It is also called an element of sample space.
7. Mutually Exclusive: Events are said to be mutually exclusive if the occurrence of one
event prevents the occurrence of another event at the same time. Such events are also
referred to as disjoint events since they have no element in common.
8. Equally Likely: The events are called equally likely when two events are said to be
equally likely if one event is as likely to occur as the other.
9. Collectively Exhaustive: The events are collectively exhaustive if the events exhaust
all possible outcomes of an experiment.
10. De Morgan’s Law: The complement of the union of two sets A and B is equal to the
intersection of the complement of the sets A and B. This is De Morgan’s first law.
1.7 ANSWERS TO IN-TEXT QUESTIONS
1. Mutually Exclusive
2. False
3. Both
3 The sample space S; {LLL, RLL, LRL, LLR, LRR, RLR, RRL, RRR}
The event that exactly one of the three vehicles turns right: A
The elements in event A: {RLL, LRL, LLR}
The event that at most one of the vehicles turns right: B
The elements in the event B: {LLL. RLL, LRL, LLR}
In the event that all three vehicles turn in the same direction: C
The elements in the event C: {LLL, RRR}
4. E1 = {1,2}, E2 = {5,6}. The two events are mutually exclusive. True
5. Disjoint
6. ¼
7. 3/8
8. ¾
9. 7/8
10. ½
14 | P a g e
11. ½
12. Total number of possible outcomes = 6= n(S)
(i) a multiple of 3
Number of favorable outcomes = 2 {3 and 6}
Hence P (getting multiple of 3) = 2/6 = 1/3
ii) a number less than 5
Number of favorable outcomes = 4 {1, 2, 3, 4}
Hence, P (getting number less than 5) = 4/6 = 2/3
iii) an even prime number
Number of favorable outcomes = 1 {2}
Hence, P (getting an even prime number) = 1/6
iv) a prime number
Number of favorable outcomes = 3 {2,3,5}
Hence the P (getting a prime number) = 3/6 = 1/2
v) a factor of 6
Number of favorable outcomes= 4 {1, 2, 3, 6}
Hence, P (getting a factor of 6) = 4/6 = 2/3
13. a) 1/2 b) 1/3 c) 1/3
14. a) 1/4 b) 7/36 c) 5/6 d) 1/6 e) 1/3 f) 11/36 g) ½
1. Two six-faced dice are rolled together, or dice is rolled twice. The total number of
possible outcomes are 36.
2. (i) Prove that the probability of null event is zero, P (∅) = 0.
(ii) Prove that for any two events A and B
P(AUB) = P(A) +P(B) - P(AB)
15 | P a g e
1.9 REFERENCES
• Devore, J. (2012). Probability and statistics for engineers, 8th ed. Cengage Learning.
• John A. Rice (2007). Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Larsen, R., Marx, M. (2011). An introduction to mathematical statistics and its
applications. Prentice Hall.
• Miller, I., Miller, M. (2017). J. Freund’s mathematical statistics with applications, 8th
ed. Pearson.
• Demetri Kantarelis, D. and Malcolm O. Asadoorian, M. O. (2009). Essentials of
Inferential Statistics, 5th edition, University Press of America.
• Hogg, R., Tanis, E., Zimmerman, D. (2021) Probability and Statistical inference,
10TH Edition, Pearson
1.10 SUGGESTED READINGS
16 | P a g e
LESSON 2
STRUCTURE
2.1 Learning Objectives
2.2 Introduction
2.3 Central Limit Theorem
2.4 Sampling Distribution
2.5 Summary
2.6 Answer to Intext Questions
2.7 Self Assessment Questions
2.8 References
2.1 LEARNING OBJECTIVES
After reading this chapter you will be familiar with the following topics.
1. Central limit theorem
2. Distribution of x and ∑ 𝑥𝑖
3. Sampling distribution of x and 𝑠 2
2.2 INTRODUCTION
This chapter will familiarize you with the most celebrated statistical theorem, the central limit
theorem. This will also help you to learn the importance of large sample sizes. If the sample
sizes are large enough then irrespective of the distribution of population, x and it will tend to
normal distribution. Similarly, it is important to understand the sampling distribution as we
can estimate the population parameters from the samples. Different methods of drawing
samples i.e. with replacement and without replacement have been discussed in the chapter. The
mean and standard deviation of the sampling distribution is explained in the chapter.
2.3 CENTRAL LIMIT THEOREM
According to the central limit theorem when the sample size n is large enough then every
distribution tends to normal distribution irrespective of the distribution of population. Let us
assume random samples are obtained by the method of random stimulation from the population
each of sample size n and if n is large enough then this will give normal distribution with mean
µ and variance. As the sample size increases the precision of the estimates increases.
17 | P a g e
For large sample size n, if are independently and identically distributed random samples with
identical mean µ and variance then we can find the distribution of x and ∑ 𝑋𝑖 .
2.3.1 distribution of x
To find the distribution of x we must know the mean and variance of x. For large sample size
n, if 𝑋1 , 𝑋2 , 𝑋3 … . 𝑋𝑛 are independently and identically distributed random samples with
identical mean µ and variance 𝜎 2 then mean and variance of x is as follows
𝑆𝑢𝑚 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
Mean of x = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑆𝑢𝑚 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑋1 + 𝑋2 + 𝑋3 ….+𝑋𝑛
x = =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑛
𝑋1 + 𝑋2 + 𝑋3 ….+𝑋𝑛
Mean of x= E(x) = E( )
𝑛
E(aX)=aE(X) where ‘a’ is a constant and X is a random variable.
Similarly E(X+Y)=E(X)+E(Y)
𝑋1 + 𝑋2 + 𝑋3 ….+𝑋𝑛 𝐸(𝑋1) + 𝐸(𝑋2 )+𝐸( 𝑋3 )….+𝐸(𝑋𝑛 )
Therefore, the mean of x= E( )=
𝑛 𝑛
Since all the random samples 𝑋1 , 𝑋2 , 𝑋3 … . 𝑋𝑛 are independently and identically distributed
random variable with mean µ. Therefore each of the 𝐸(𝑋1) , 𝐸(𝑋2 ), 𝐸( 𝑋3 ) … . , 𝐸(𝑋𝑛 ) have
identical mean µ. There are n samples each with mean µ so, the 𝐸(𝑋1) + 𝐸(𝑋2 ) +
𝐸( 𝑋3 ) … . +𝐸(𝑋𝑛 )=nµ.
𝐸(𝑋1) + 𝐸(𝑋2 )+𝐸( 𝑋3 )….+𝐸(𝑋𝑛 ) µ+µ+µ+µ…µ 𝑛µ
E(x)= = = =µ
𝑛 𝑛 𝑛
So, we have proved the mean of x = E(x)= µ
𝑋1 + 𝑋2 + 𝑋3 ….+𝑋𝑛
Variance of x = V(x)= V( )
𝑛
V(aX)=𝑎2 𝑉(𝑋)
𝑋 𝑋 𝑋 𝑋
V(x)= V( 𝑛1) + 𝑉( 𝑛2) + 𝑉( 𝑛3) … 𝑉( 𝑛𝑛)
1 1 1 1
V(x)=𝑛2 𝑉(𝑋1 ) + 𝑛2 𝑉(𝑋2 ) + 𝑛2 𝑉(𝑋3 ) … + 𝑛2 𝑉(𝑋𝑛 )
1
V(x)=𝑛2 (𝑉(𝑋1 ) + 𝑉(𝑋2 ) + 𝑉(𝑋3 ) … + 𝑉(𝑋𝑛 ))
Since 𝑉(𝑋1 ), 𝑉(𝑋2 ), 𝑉(𝑋3 ) … , 𝑉(𝑋𝑛 ) each have identical variance of 𝜎 2 and there are n
samples so, total 𝜎 2 + 𝜎 2 + 𝜎 2 +…..+𝜎 2 will be equal to n𝜎 2
1
V(x)=𝑛2 (𝜎 2 + 𝜎 2 + 𝜎 2 +…..𝜎 2 )
1
V(X) = 𝑛2 ∗ 𝑛𝜎 2
18 | P a g e
𝜎2
V(x) = 𝑛
𝜎2
Therefore, mean and variance of x is µ and respectively. To standardize the distribution of
𝑛
X with the mean and variance we get
x̅−µ x̅−µ
𝑍= 2
= 𝜎 ~N(0,1)
√𝜎 √𝑛
𝑛
Example: Birth rate in a country is believed to be 1.57 per women. Assume the population
standard deviation is 0.4. If a random sample of 160 women is selected, what is the probability
that the sample mean will fall between 1.52 and 1.62?
Solution: X: birth rate in a country per women
X i = X1 + X 2 + ... + X n
E(X i ) = E( X1 + X 2 + ... + X n )
E(X i ) = E( X1) + E( X 2 ) + ... + E( X n )
E(X i ) = + + .... +
So, adding n times will give the result as E(X i ) = n Eq. (1)
19 | P a g e
Example: the cost of repairing a car after an accident is $6200 and standard deviation of $650.
A study was carried out on 65 vehicles that had been involved in the accidents. Calculate the
probability that the total repair bill for the vehicles exceeded $4,00,000.
Let the total repair bill for the cars be normally distributed with n=65 cars and mean cost of
$6200. So, the total cost of repairing is 65*6200 with the variance of 65*6502
To calculate the total cost should exceed 400000
400000−403000
P(T>400,000)= P(Z> )
5240
P(Z>-0.572)= P(Z<0.572)=0.71634
IN-TEXT QUESTION
1. Question. Consider a random sample of size 30 taken from a Normal distribution with
Mean 60 and variance 25. Let the sample mean be denoted by X . So, calculate the
probability that X assumes a value greater than 62.
2. Question. In a large population the distribution of a variable has mean 165 and standard
deviation 25 units. If a random sample of size 35 is chosen, find the approximate
probability that the sample mean lies between 162 and 170.
2.4 SAMPLING DISTRIBUTION
From a population of N observation if k samples are drawn with or without replacement with
the sample size of n. The sample statistics for each sample can be obtained and this gives the
sampling distribution. The population parameters can be estimated from the sampling
distribution. This gives the sampling distribution of random variables.
If the samples are drawn with replacement from population with the sample size of n. then total
number of samples obtained will be k=𝑁 𝑛
20 | P a g e
If the samples are drawn with replacement from population with the sample size of n. then total
number of samples obtained will be k=𝐶𝑛𝑁
If k samples are drawn with replacement, then the distribution of x can be obtained.
Steps required to make sampling distribution of x
1. Draw the k number of samples from the population
2. Obtain the sample statistic x from each of the obtained sample
3. Determine the probability of occurrence each x.
4. Represent x with the associated probability in tabular format.
5. Hence, the sampling distribution of x is obtained.
Example: XYZ insurance company deals in term life insurance policies and sells three tenures
of insurance policies 25 years, 40 years and 65 years. 20% of all purchasers select 25 year
policy, 50% select a 40 year policy and remaining 30% choose 65 years box. Let x1 and x2
denote the insurance tenure selected by two independently selected insurance holder.
Solution: So, since no information is given in the question regrading the selection procedure
of samples so, we will assume with-replacement random sample.
Total samples obtained is 𝑁 𝑛 = 32 =9 since there are three types of policies available and we
have to select the sample of 2 i.e. sample size is 2.
So, (25,25), (25,40),(25,65),(40,40),(40,25),(40,65),(65,65),(65,25),(65,40).
By obtaining the mean of each sample we get
Samples Mean of samples probability
(25,25) 25 0.2*0.2
(25,40) 32.5 0.2*0.5
(25,65) 45 0.2*0.3
(40,40) 40 0.5*0.5
(40,25) 32.5 0.5*0.2
(40,65) 52.5 0.5*0.3
(65,65) 65 0.3*0.3
(65,25) 45 0.3*0.2
(65,40) 52.5 0.3*0.5
21 | P a g e
Some numbers are repeated in the distribution by adding the respective probabilities we obtain
the sampling distribution as follows.
x P(x)
25 0.04
32.5 0.20
40 0.25
45 0.12
52.5 0.30
65 0.09
E(x)= x*P(x)
E(x)=25*0.04+32.5*0.20+40*0.25+45*0.12..+65*0.09
E(x)= 44.5
The mean of x is µ.
The standard deviation of the sampling distribution of x is known as standard error of the
𝜎
distribution represented by 𝜎𝑥 and is calculated by √𝑛.
Similarly sampling distribution of 𝑆 2 can be obtained from the samples.
IN-TEXT QUESTION
Example: XYZ insurance company deals in term life insurance policies and sells three tenures
of insurance policies 25-years, 40-years and 65-years. 20% of all purchasers select 25-year
policy, 50% select a 40-year policy and remaining 30% choose 65-years box. Let x1 and x2
denote the insurance tenure selected by two independently selected insurance holder.
2.5 SUMMARY
This chapter has familiarized you with basic concepts related to sampling distribution and
central limit theorem. This unit have familiarized you with distribution of x and ∑ 𝑥𝑖 . This
chapter have explained you the concept of sampling distribution with and without replacement.
Mean of x and standard error of x have been discussed in the chapter. We have tried to explain
the concepts with examples to create better understanding of the concept. This will help you
to think about economic theories in a much better way.
2.6 ANSWERS TO INTEXT QUESTION
1. n = 30, = 60, 2 = 25
22 | P a g e
2
X ~ N ,
n
25
X ~ N 60,
30
X − 62 − 60
P( X 62) = P 5
n 30
23 | P a g e
= 0.639
As the sample size increases then distribution of X will tend to normal distribution. For a
distribution to be approximated to normal distribution, sample size must be atleast 30 or in
other words n 30. As the sample size increases, even discrete distribution approximates to
normal distribution.
3. Solution: So, since no information is given in the question regarding the selection
procedure of samples so, we will assume with replacement random sample.
Total samples obtained is 𝑁 𝑛 = 32 =9 since there are three types of policies available and we
have to select the sample of 2 i.e. sample size is 2.
So, (25,25), (25,40),(25,65),(40,40),(40,25),(40,65),(65,65),(65,25),(65,40).
Samples Variance of samples probability
(25,25) 0 0.2*0.2
(25,40) 112.5 0.2*0.5
(25,65) (25-45)^2+(65-45)^2=800 0.2*0.3
(40,40) 0 0.5*0.5
(40,25) 112.5 0.5*0.2
(40,65) (40-52.5)^2+(65-52.5)^2=312.5 0.5*0.3
(65,65) 0 0.3*0.3
(65,25) 800 0.3*0.2
(65,40) 312.5 0.3*0.5
24 | P a g e
25 | P a g e
LESSON 3
CHARACTERISTICS OF ESTIMATORS
STRUCTURE
3.1 Learning Objectives
3.2 Introduction
3.3 Characterstic of Estimators
3.3.1 Unbiasedness
3.3.2 Consistency
3.3.3 Efficiency
3.3.4 Sufficiency
3.4 In-Text Questions
3.5 Summary
3.6 Glossary
3.7 Answer to in-text questions
3.8 References
3.9 Suggested Readings
3.1 LEARNING OBJECTIVES
One of the main objectives of Statistics is to draw inferences about a population from the
analysis of a sample drawn from that population. Two important problems in statistical
inference are
(i) Estimation
(ii) Testing of Hypothesis.
The theory of estimation was founded by Prof. R.A. Fisher in a series of fundamental papers
round about 1930.
3.2 INTRODUCTION
Let us consider a random variable 𝑋 with 𝑝, 𝑑. 𝑓. 𝑓(𝑥, 𝜃). In most common applications, though
not always, the functional form of the population distribution is assumed to be known except
for the value of some unknown parameter(s) 𝜃 which may take any value on a set Θ. This is
26 | P a g e
expressed by writing the p.d.f. in the form 𝑓(𝑥, 𝜃), 𝜃 ∈ Θ. The set Θ, which is the set of all
possible values of 𝜃 is called the parameter space. Such a situation gives rise not to one
probability distribution but a family of probability distributions which we write as { f
(𝑥, 𝜃), 𝜃 ∈ Θ }, e.g., if 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then the parameter space Θ = [(𝜇, 𝜎 2 ): −∞ < 𝜇 <
∞; 0 < 𝜎 < ∞ ]
In particular, for 𝜎 2 = 1, the family of probability distributions is given by:
{𝑁(𝜇, 1); 𝜇 ∈ Θ},whereΘ = {𝜇: −∞ < 𝜇 < ∞}
In the following discussion we shall consider a general family of distributions:
{𝑓(𝑥; 𝜃1 , 𝜃2 , … , 𝜃𝑘 ): 𝜃𝑖 ∈ Θ, 𝑖 = 1,2, … , 𝑘}.
Let us consider a random sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 of size 𝑛 from a population, with probability
function 𝑓(𝑥; 𝜃1 , 𝜃2 , … , 𝜃𝑘 ), where 𝜃1 , 𝜃2 … , 𝜃𝑘 are the unknown population parameters. There
will then always be an infinite number of functions of sample values, called statistics, which
may be proposed as estimates of one or more the parameters.
Evidently, the best estimate would be one that falls nearest to the true value of the parameter
to be estimated. In other words, the statistic whose distribution concentrates as closely as
possible near the true value of the parameter may be regarded the best estimate. Hence the
basic problem of the estimation in the above case, can be formulated as follow
27 | P a g e
𝐸(𝑥‾) = 𝜇
2) 2 2) 2
and 𝐸(𝑠 ≠ 𝜎 but 𝐸(𝑆 =𝜎 .
1
𝑠 2 = 𝑛 ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾)2
1
Hence there is a reason to prefer 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾)2 , to the sample variance.
Note: If 𝐸(𝑇𝑛 ) > 𝜃, 𝑇𝑛 is said to be positively biased and if 𝐸(𝑇𝑛 ) < 𝜃, it is said to be
negatively biased, the amount of bias 𝑏(𝜃) being given by
𝑏(𝜃) = 𝐸(𝑇𝑛 ) − 𝛾(𝜃), 𝜃 ∈ Θ
Example 1. Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample from a normal population 𝑁(𝜇, 1). Then
1
Show that 𝑡 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 , is an unbiased estimator of 𝜇 2 + 1.
Solution. We are given: 𝐸(𝑥𝑖 ) = 𝜇, 𝑉(𝑥𝑖 ) = 1 ∀𝑖 = 1,2, … , 𝑛
2
Now 𝐸(𝑥𝑖2 ) = 𝑉(𝑥𝑖 ) + (𝐸(𝑥𝑖 )) = 1 + 𝜇 2.
1 1 1
∴ 𝐸(𝑡) = 𝐸 (𝑛 ∑𝑛𝑖=1 𝑥𝑖2 ) = 𝑛 ∑𝑛𝑖=1 𝐸(𝑥𝑖2 ) = 𝑛 ∑𝑛𝑖=1 (1 + 𝜇 2 ) = 1 + 𝜇 2
28 | P a g e
Hence sample mean (𝑋‾𝑛 ) is always a consistent estimator of the population mean (𝜇).
Note 2. Obviously consistency is a property concerning the behaviour of an estimator for
indefinitely large values of the sample size 𝑛, Lee, as 𝑛 → ∞. Nothing is regarded of its
Moreover, if there exists a consistent estimator, say, 𝑇n of 𝛾(𝜃), then infinitely many such
behaviour for finite 𝑛. eftimators can be constructed, eg.
𝑛−𝑎 1 − (𝑎/𝑛) 𝑝
𝑇𝑛′ = ( ) 𝑇𝑚 = [ ] 𝑇𝑛 → 𝑇𝑛 ⟶ 𝛾(𝜃), as 𝑛 → ∞
𝑛−𝑏 1 − ((b/𝑛)
and hence, for different values of 𝑎 and 𝑏, 𝑇𝑛 is also consistent for 𝛾(𝜃).
Invariance Property of Consistent Estimators.
Theorem : If 𝑇𝑛 is a consistent estimator of 𝛾(𝜃) and 𝜓(𝛾(𝜃)) is a contimuous functiog of
𝛾(𝜃), then 𝜓(𝑇𝑛 ) is a consistent estimator of 𝜓(𝛾(𝜃)).
𝑝
Proof. Since 𝑇𝑛 is a consistent estimator of 𝛾(𝜃), 𝑇𝑛 ⟶ 𝛾(𝜃 as 𝑛 → ∞, i.e. for every 𝜀 >
0, 𝜂 > 0, ∃ a positive integer 𝑛 ≥ 𝑚(𝜀, 𝜂) such that
𝑃{|𝑇𝑛 − 𝛾(𝜃)| < 𝑒 ∣> 1 − 𝑛, ∀𝑛 ≥ 𝑚
Since 𝜓(⋅) is a continuous function, for every 𝜀 > 0, however small, ∃ a positive number 𝜀1
such that | 𝜓(𝑇𝑛 ) − 𝜓|𝛾(𝜃)| ∣< 𝜀1, whenever |𝑇𝑛 − 𝛾(𝜃)| < 𝜀, i.e.,
|𝑇𝑛 − 𝛾(𝜃)| < 𝜀 ⇒ |𝜓(𝑇𝑛 ) − 𝜓|𝛾(𝜃)|| < 𝜀1
For two events 𝐴 and 𝐵, if 𝐴 ⇒ 𝐵, then
𝐴 ⊆ 𝐵 ⇒ 𝑃(𝐴) ≤ 𝑃(𝐵) or 𝑃(𝐵) ≥ 𝑃(𝐴)
So
𝑃[∣ 𝜓(𝑇𝑛 ) − 𝜓(𝛾(𝜃)|| < 𝜀1 ] ≥ 𝑃[|𝑇𝑛 − 𝛾(𝜃)| < 𝜀]
⇒ 𝑃[∣ 𝜓(𝑇𝑛 ) − 𝜓(𝛾(𝜃)|| < 𝜀1 ] ≥ 1 − 𝜂; ∀𝑛 ≥ 𝑚
𝑝
𝜓(𝑇𝑛 ) ⟶ 𝜓[𝛾(𝜃)], as 𝑛 → ∞ or 𝜓(𝑇𝑛 ) is a consistent estimator of 𝛾(𝜃).
29 | P a g e
𝑇 1 𝑝𝑞
Var (𝑋‾) = Var (𝑛) = 𝑛2 , Var (𝑇) = 𝑛 → 0 as 𝑛 → ∞.
30 | P a g e
1 1
=[ exp {−(𝑥 − 𝜇)2 /2𝜎 2 }] =
𝜎√2𝜋 𝑥=𝜇 𝜎√2𝜋
2
1 𝜋𝜎
∴ 𝑉(𝑀𝑑) = ⋅ 2𝜋𝜎 2 =
4𝑛 2𝑛
median is also an unbiased and consistent estimator of 𝜇. Thus, there is a necessity of some
further criterion which will enable us to choose between the estimators with the common
property of consistency. Such a criterion which is based on the variances of the sampling
distribution of estimators is usually known as efficiency. If, of the two consistent estimators
𝑇1 , 𝑇2 of a certain parameter 𝜃, we have
𝑉(𝑇1 ) < 𝑉(𝑇2 ), for all 𝑛 then 𝑇1 is more efficient than 𝑇2 for all samples sizes.
Since 𝑉(𝑥‾) < 𝑉(𝑀𝑑), we conclude that for normal distribution, sample mean is more efficient
estimator for 𝜇 than the sample median, for large samples at least.
Most Efficient Estimator: If in a class of consistent estimators for a parameter, there exist
Vone whose sampling variance is less than that of any such estimator, it is called the most
efficient estimator. Wheneoer such an estimator exists, it provides a criterion for measurement
of efficiency of the other estimators.
Efficiency If 𝑇1 is the most efficient estimator with variance 𝑉1 and 𝑇2 is ary other estimator
with variance 𝑉2, then the efficiency 𝐸 of 𝑇2 is defined as :
𝑉1
𝐸=
𝑉2
Obvionsly, E cannot exceed unity.
If 𝑇, 𝑇1 , 𝑇2 , … , 𝑇𝑛 are all estimators of 𝛾(𝜃) and Var (𝑇) is minimum, then the efficiency 𝐸𝑖 of
𝑇i , (𝑖 = 1,2, … , 𝑛) is defined as :
Var 𝑇
𝐸𝑖 = ; 𝑖 = 1,2, … , 𝑛
Var 𝑇𝑖
Obviously 𝐸𝑖 ≤ 1; 𝑖 = 1,2, … 𝑛. For example, in the normal samples, since sample mean 𝑥‾ is
the most efficient estimator of 𝜇 , the efficiency E of 𝑀𝑑 for such samples, (for large 𝑛 ), is :
𝑉(𝑥‾) 𝜎 2 /𝑛 2
𝐸= = = = 0.637.
𝑉(𝑀𝑑) 𝜋𝜎 2 /(2𝑛) 𝜋
31 | P a g e
Find 𝜆. Are 𝑡1 and 𝑡2 mbiased? State giving reasons, the estimator which is best among 𝑡1 , 𝑡2
and 𝑡3
Solution. We are given :
𝐸(𝑋𝑖 ) = 𝜇, Var (𝑋𝑖 ) = 𝜎 2 , ( say ); Cov (𝑋𝑖 , 𝑋𝑗 ) = 0, (𝑖 ≠ 𝑗 = 1,2, … , 𝑛)
1 1 1
(i) 𝐸(𝑡1 ) = 5 ∑5𝑖=1 𝐸(𝑋𝑖 ) = 5 ∑5𝑖=1 𝜇 = 5 , 5𝜇 = 𝜇 ⇒ 𝑡1 is an unbiased estimator of 𝜇.
1 1
(ii) 𝐸(𝑡2 ) = 2 𝐸(𝑋1 + 𝑋2 ) + 𝐸(𝑋3 ) = 2 (𝜇 + 𝜇) + 𝜇 = 2𝜇
1 1
𝑉(𝑡1 ) = {𝑉(𝑋1 ) + 𝑉(𝑋2 ) + 𝑉(𝑋3 ) + 𝑉(𝑋4 ) + 𝑉(𝑋5 )} = 𝜎 2
25 5
1 1 2 3
𝑉(𝑡2 ) = {𝑉(𝑋1 ) + 𝑉(𝑋2 )} + 𝑉(𝑋3 ) = 𝜎 + 𝜎 2 = 𝜎 2
4 2 2
1 1 5
𝑉(𝑡3 ) = {4𝑉(𝑋1 ) + 𝑉(𝑋2 )} = (4𝜎 2 + 𝜎 2 ) = 𝜎 2 (∵ 𝜆 = 0)
9 9 9
Since 𝑉(𝑡1 ) is least, 𝑡1 is the best estimator (in the sense of least variance) of 𝜇.
Example 2. 𝑋1 , 𝑋2, and 𝑋3 is a random sample of size 3 from a population with mean value 𝜇
ald variance 𝜎 2 , 𝑇1 , 𝑇2 , 𝑇3 are the estimators used to estimate mean value 𝜇, where 𝑇1 = 𝑋1 +
1
𝑋2 − 𝑋3 , 𝑇2 = 2𝑋1 + 3𝑋3 − 4𝑋2 , and 𝑇3 = 3 (𝜆𝑋1 + 𝑋2 + 𝑋3 )/3.
32 | P a g e
Solution. Since 𝑋1 , 𝑋2 , 𝑋3 is a random sample from a population with mean 𝜇 and variance
𝜎 2 , 𝐸(𝑋𝑖 ) = 𝜇, Var (𝑋𝑖 ) = 𝜎 2 and Cov (𝑋𝑖 , 𝑋𝑗 ) = 0, (𝑖 ≠ 𝑗 = 1,2, … , 𝑛)
(i) We have
𝐸(𝑇1 ) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) − 𝐸(𝑋3 ) = 𝜇 ⇒ 𝑇1 is an unbiased estimator of 𝜇
𝐸(𝑇2 ) = 2𝐸(𝑋1 ) + 3𝐸(𝑋3 ) − 4𝐸(𝑋2 ) = 𝜇 ⇒ 𝑇2 is an unbiased estimator of 𝜇.
1
(ii) We are given: 𝐸(𝑇3 ) = 𝜇 ⇒ 3 {𝜆𝐸(𝑋1 ) + 𝐸(𝑋2 ) + 𝐸(𝑋3 )} = 𝜇
1
⇒ (𝜆𝜇 + 𝜇 + 𝜇) = 𝜇 ⇒ 𝜆 + 2 = 3 ⇒ 𝜆 = 1
3
1
(iii) With 𝜆 = 1, 𝑇3 = 3 (𝑋1 + 𝑋2 + 𝑋3 ) = 𝑋‾. Since sample mean is a consistent estimator of
population mean 𝜇, by Weak Law of Large Numbers, 𝑇3 is a consistent estimator of 𝜇.
(iv) We have
Var (𝑇1 ) = Var (𝑋1 ) + Var (𝑋2 ) + Var (𝑋3 ) = 3𝜎 2
Var (𝑇2 ) = 4Var (𝑋1 ) + 9Var (𝑋3 ) + 16Var (𝑋2 ) = 29𝜎 2
1 1
Var (𝑇3 ) = [Var (𝑋1 ) + Var (𝑋2 ) + Var (𝑋3 )] = 𝜎 2 (∵ 𝜆 = 1)
9 3
Since Var (𝑇3 ) is minimum, 𝑇3 is the best estimator of 𝜇 in the sense of minimum variance.
Minimum Variance Unbiased (M.V.U.) Estimators:
If a statistic 𝑇 = 𝑇(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), based on sample of size 𝑛 is such that
(i) 𝑇 is unbiased for 𝛾(𝜃), for all 𝜃 ∈ Θ and
(ii) It has the smallest variance among the class of all unbiased estimators of 𝛾(𝜃), then 𝑇
is called the minimum variance unbiased estimator (𝑀𝑉𝑈𝐸) of 𝛾(𝜃).
More precisely, 𝑇 is MVUE of 𝛾(𝜃) if
𝐸𝜃 (𝑇) = 𝛾(𝜃) for all 𝜃 ∈ Θ and Var𝜃 (𝑇) ≤ Var𝜃 (𝑇 ′ ) for all 𝜃 ∈ Θ
where 𝑇 ′ is any other unbiased estimator of 𝛾(𝜃).
33 | P a g e
1
𝐸(𝑇) = {𝐸(𝑇1 ) + 𝐸(𝑇2 )} = 𝛾(𝜃)
2
1 1
Var (𝑇) = Var { (𝑇1 + 𝑇2 )} = Var (𝑇1 + 𝑇2 ) [∵ Var (𝐶𝑋) = 𝐶 2 Var (𝑋)]
2 4
1
= {Var (𝑇1 ) + Var (𝑇2 ) + 2Cov (𝑇1 , 𝑇2 )}
4
1
= {Var (𝑇1 ) + Var (𝑇2 ) + 2𝜌√Var (𝑇1 )Var (𝑇2 )}
4
1
= Var (𝑇1 )(1 + 𝜌),
2
where 𝜌 is Karl Pearson's co-efficient of correlation between 𝑇1 and 𝑇2 .
Since 𝑇1 is the 𝑀𝑉𝑈 estimator, Var (𝑇) ≥ Var (𝑇1 )
1 1
⇒ Var (𝑇1 )(1 + 𝜌) ≥ Var (𝑇1 ) ⇒ (1 + 𝜌) ≥ 1 ⇒ 𝜌 ≥ 1
2 2
Since |𝜌| ≤ 1, we must have 𝜌 = 1, i.e., 𝑇1 and 𝑇2 must have a linear relation of the form:
𝑇1 = 𝛼 + 𝛽𝑇2 where 𝛼 and 𝛽 are constants independent of 𝑥1 , 𝑥2 , … , 𝑥𝑛 but may depend on
𝜃, i.e., we may have 𝛼 = 𝛼(𝜃) and 𝛽 = 𝛽(𝜃).
Taking expectation of both sides then we get
𝜃 = 𝛼 + 𝛽𝜃
Also we get Var (𝑇1 ) = Var (𝛼 + 𝛽𝑇2 ) = 𝛽 2 Var (𝑇2 )
⇒ 1 = 𝛽 2 ⇒ 𝛽 = ±1
But since 𝜌(𝑇1 , 𝑇2 ) = +1, the coefficient of regression of 𝑇1 on 𝑇2 must be positive.
∴ 𝛽=1⇒𝛼=0
so we get 𝑇1 = 𝑇2 as desired.
Theorem 2. Let 𝑇1 and 𝑇2 be unbiased estimators of 𝛾(𝜃) with efficiencies 𝑐1 and 𝑐2
respectively anf 𝜌 = 𝜌𝜃 be the correlation coefficient between them. Then
√𝑒1 𝑒2 − √(1 − 𝑒1 )(1 − 𝑒2 ) ≤ 𝜌 ≤ √𝑒1 𝑒2 + √(1 − 𝑒1 )(1 − 𝑒2 )
Proof. Let 𝑇 be minimum variance unbiased estimator of 𝛾(𝜃). Then we are given :
34 | P a g e
𝜌 2 1 1 2
( − 1) − ( − 1) ( − 1) ≤ 0 ⇒ (𝜌 − √𝑒1 𝑒2 ) − (1 − 𝑒1 )(1 − 𝑒2 ) ≤ 0
√𝑒1 𝑒2 𝑒1 𝑒2
∴ 𝜌2 − 2√𝑒1 𝑒2 𝜌 + (𝑒1 + 𝑒2 − 1) ≤ 0
This implies that 𝜌 lies between the roots of the equation :
𝜌2 − 2√𝑒1 𝑐2 𝜌 + (𝑒1 + 𝑒2 − 1) = 0
1
which are given by : 2 {2√𝑒1 𝑒2 ± 2√𝑒1 𝑒2 − (𝑒1 + 𝑒2 − 1)} =
35 | P a g e
0, if ∑ 𝑥𝑖 ≠ 𝑘
{ 𝑖=1
′
Since this does not depend on ' 𝑝 , 𝑇 = ∑𝑛𝑖=1 𝑥𝑖 is sufficient for ' 𝑝 '.
FACTORIZATION THEOREM (Neymann).
The necessary and sufficient condition for a distribution to admit sufficient statistic is
provided by the 'factorization theorem' due to Neymann.
Statement 𝑇 = 𝑡(𝑥) is sufficient for 𝜃 if and only if the joint density function 𝐿 (say), of the
sample values can be expressed in the form:
𝐿 = 𝑔𝜃 [𝑡(𝑥)] ⋅ ℎ(𝑥)
36 | P a g e
where (as indicated) 𝑔𝜃 [𝑡(𝑥)] depends on 𝜃 and 𝑥 only through the value of 𝑡(𝑥) and ℎ(𝑥) is
independent of 𝜃.
Remarks 1. It should be clearly understood that by 'a function independent of 𝜃 , we not
only mean that it does not involve 𝜃 but also that its domain does not contain 𝜃. For example,
the function:
1
𝑓(𝑥) = , 𝑎 − 𝜃 < 𝑥 < 𝑎 + 𝜃; −∞ < 𝜃 < ∞
2𝑎
depends on 𝜃.
2. It should be noted that the original sample 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 ), is always a sufficient
statistic.
3. The most general form of the distributions admitting sufficient statistic is Koopman's
form and is given by: 𝐿 = 𝐿(𝐱, 𝜃) = 𝑔(𝑥) ⋅ ℎ(𝜃). exp {𝑎(𝜃)𝜓(𝑥)] where ℎ(𝜃) and
𝑎(𝜃) are functions of the parameter 𝜃 only and 𝑔(𝑥) and 𝜓(𝑥) are the functions of the
sample observations only.
4. Invariance Property of Sufficient Estimator: If 𝑇 is a sufficient estimator for the
parameter 𝜃 ayd if 𝜓(𝑇) is a one to one function of 𝑇, then 𝜓(𝑇) is sufficient for 𝜓(𝜃).
5. Fisher-Neyman Criterion. A statistic 𝑡1 = 𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) is sufficient estimator of
parimeter 𝜃 if and only if the likelihood function (joint p.d.f of the sample) can be
expressed as :
𝑛
where 𝑔1 (𝑡1 , 𝜃) is the p.d.f. of the statistic 𝑡1 and 𝑘(𝑥1 , 𝑥2 … . 𝑥𝑛 ) is a function of sample
observations only, independent of 𝜃.
Note that this method requires the working out of the p.d.f. (p.m.f.) of the statistic 𝑡1 =
𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), which is not always easy.
Example 1. Let 𝑥1 , 𝑥2 , … , 𝑥1 be a random sample from a uniform population on [0, 𝜃]. Find
asufficient estimator for 𝜃.
1
, 0 ≤ 𝑥𝑖 ≤ 𝜃
Solution. We are given: 𝑓𝜃 (𝑥𝑖 ) = { 𝜃
0, otherwise
1, if 𝑎 ≤ 𝑏 𝑘(0,𝑥𝑖 )𝑘(𝑥𝑖 ,𝜃)
Let 𝑘(𝑎, 𝑏) = }. then 𝑓0 (𝑥𝑖 ) = ,
0, if 𝑎 > 𝑏 𝜃
37 | P a g e
𝑛 𝑛
𝑘(0, 𝑥𝑖 )𝑘(𝑥𝑖 , 𝜃) 𝑘 (0, min 𝑥𝑖 ) ⋅ 𝑘 ( max 𝑥𝑖 , 𝜃)
1≤𝑖≤𝑛 1≤𝑖≤𝑛
𝐿 = ∏ 𝑓𝜃 (𝑥𝑖 ) = ∏ [ ]= = 𝑔0 (𝑡(𝑥) ∣ ℎ(𝑥)
𝜃 𝜃𝑛
𝑖=1 𝑖=1
where
𝑘{𝑡(𝐱), 𝜃}
𝑔0 [𝑡(𝐱)] = , 𝑡(𝑥) = max 𝑥𝑖 and ℎ(𝑥) = 𝑘 (0, min 𝑥𝑖 )
𝜃𝑛 1≤𝑖≤𝑛 1≤𝑖≤𝑛
Aliter. We have
𝑛
1
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = ; 0 < 𝑥𝑖 < 𝜃
𝜃𝑛
𝑖=1
1 𝑛 1
where 𝑔𝜃 [𝑓(𝑥)] = (𝜎√2𝜋) exp [− 2𝜎2 {𝑓2 (𝑥) − 2𝜇𝜇1 (𝑥) + 𝑛𝜇 2 }]
Thus 𝑡1 (𝑥) = Σ𝑥1 is sufficient for 𝜇 and 𝑡2 (𝑥) = ∑𝑥12 , is sufficient for 𝜎 2 .
Let 𝑌1 , 𝑌2 , … , 𝑌𝑛 denote the order statistics of the random sample such that 𝑌1 < 𝑌2 < ⋯ <
𝑌𝑛 . The p.d.f. of the smallest observation 𝑌1 is given by:
𝑔1 (𝑦1 , 𝜃) = 𝑛[1 − 𝐹(𝑦1 )]𝑛−1 𝑓(𝑦1 , 𝜃),
where 𝐹(⋅) is the distribution function corresponding to 𝑝 ⋅ 𝑑. 𝑓. 𝑓(⋅).
Thus the likelihood function (") of X1 , X2 , … , X𝑛 may be expressed as
𝑛
𝑛𝜃
exp (− ∑𝑛𝑖=1 𝑥𝑖 )
𝐿 =𝑒 exp (− ∑ 𝑥𝑖 ) = 𝑛exp (−𝑛(𝑦1 − 𝜃)) { }
𝑛exp (−𝑛𝑦𝑖 )
𝑖=1
exp (− ∑𝑛𝑖=1 𝑥𝑖 )
= 𝑔1 (m 𝑥𝑖 , 𝜃) { }
𝑛exp (−𝑛𝑦𝑖 )
Hence by Fisher-Neymann criterion, the first order statistic 𝑌1 = min(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) is a
sufficient statistic for 𝜃.
3.4 IN-TEXT QUESTIONS
Question: 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑁
be identically distributed random variable with mean 2 and variance 1. Let N be a random
variable follows Poisson distribution with mean 2 and independent of
𝑋i′ S. Let 𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 , then Var (SN ) is equals
A. 4
B. 10
C. 2
D. 1
39 | P a g e
Question: 2
Let 𝐴 and 𝐵 be independent Random Variables each having the uniform distribution on
[0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵}, then Cov (𝑈, 𝑉) is equals
A. -1/36
B. 1/36
C. 1
D. 0
Question: 3
Let 𝑋1 , 𝑋2 , 𝑋3 be random sample from uniform (0, 𝜃 2 ), 𝜃 > 1 Then maximum likelihood
estimation (mle) of 𝜃
2
A. 𝑋(1)
B. √X(3)
C. √X(1)
D. 𝛼𝑋(1) + (1 − 𝛼)𝑋(3) ; 0 < 𝛼 < 1
Question: 4
For the discrete variate with density:
1 6 1
𝑓(𝑥) = 𝐼(−1) (𝑥) + 𝐼(0) (𝑥) + 𝐼(1) (𝑥).
8 8 8
40 | P a g e
√2(𝑋1 + 𝑋2 )
𝑊=
√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2
1
A. 3
1
B. 6
1
C. 2
5
D. 6
Question: 7
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from
1 1
𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and
𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }. Define
1 1
𝑇1 = (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
2 4
41 | P a g e
1
and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2)
an estimator for 𝜃, then which of the following is/are
TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
42 | P a g e
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as
n→∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question : 10
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
43 | P a g e
Question : 13
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃 ) distribution, where 𝜃 ∈ (0, ∞).
1
If 𝑋‾ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 , then a 95% confidence interval for 𝜃 is
2
𝜒2𝑛,0.95
A. (0, ]
𝑛𝑋‾
2
𝜒2𝑛,0.95
B. [ , ∞)
𝑛𝑋‾
2
𝜒2𝑛,0.95
C. (0, ]
2𝑛𝑋‾
2
𝜒2𝑛,0.95
D. [ , ∞)
2𝑛𝑋‾
Question: 14
𝑋𝑖 , 𝑖 = 1,2, …
be independent random variables all distributed according to the PDF 𝑓𝑥 (𝑥) = 1,0 ≤ 𝑥 ≤ 1.
Define
𝑌𝑛 = 𝑋1 𝑋2 𝑋3 … 𝑋𝑛 , for some integer n. Then Var (𝑌𝑛 ) is equal to
𝑛
A. 12
1 1
B. − 22𝑛
3𝑛
1
C. 12𝑛
1
D. 12
Question : 15
Let 𝑋1 , 𝑋2 , … , 𝑋4
be i.i.d random variables having continuous distribution.
Then
44 | P a g e
Question : 16
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from
1 1
U (𝜃 − 0 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and
𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }
A. 1 only
B. 2 only
C. Both 1 and 2
D. Neither 1 nor 2
Question: 17
Lęt 𝐹𝑛 be a sequence of DFs defined by
0, 𝑥<0
1
𝐹𝑛 (𝑥) = {1 − , 0 ≤ 𝑥 ≤ 𝑛 and let
𝑛
1, 𝑛≤𝑥
lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥)
then which of the following is/are TRUE?
45 | P a g e
0, 𝑥<0
A. 𝐹(𝑥) = {
1, 𝑥≥0
1
A. (𝑋‾ − 2) is an unbiased estimate of 𝜃
𝑋(1) +𝑋(𝑛) 1
B. − 2 is an unbiased estimate of 𝜃
2
Question: 19
Lęt {𝑋𝑛 , 𝑛 ≥ 1} be i.i.d uniform (−1,2) random variables and let 𝑆𝑛 = ∑𝑛𝑘=1 𝑋𝑘 ⋅
Then, as n → ∞
𝑆𝑛 1
A. → 2 in probability
𝑛
𝑠𝑛 1
B. → 2 in distribution
𝑛
C. P(𝑆𝑛 ≤ 𝑛) → 1 as n → ∞
D. all options are correct
46 | P a g e
Question : 20
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛
denote random sample of size n from a uniform population with probability density function
1 1
𝑓(𝑥, 𝜃) = 1; 𝜃 − 2 ≤ 𝑥 ≤ 𝜃 + 2 , −∞ < 𝑥 < ∞
𝑋(𝑛) +𝑋(1)
Define 𝑇𝑛 = .
2
A. 𝑇𝑛 is consistent for 𝜃
B 𝑇𝑛 is MLE for 𝜃
C. 𝑇𝑛 is unbiased consistent for 𝜃
D. all options are correct
Question : 21
The cumulative distribution function of a random variable X given by
0, if 𝑥 < 0
4
, if 0 ≤ 𝑥 < 1
𝐹(𝑥) = 9
8
, if 1 ≤ 𝑥 < 2
9
{1, if 𝑥 ≥ 2
Which of the following statements is (are) TRUE?
A. The random variable 𝑋 takes positive probability only at least two points
5
B. 𝑃(1 ≤ 𝑋 ≤ 2) = 9
21
C. E(X) = 3
4
D. 𝑃(0 < 𝑋 < 1) = 9
Question: 22
Let A and B be events in a sample space S such that
1 1
𝑃(𝐴) = 2 = 𝑃(𝐵) and 𝑃(𝐴𝑐 ∩ 𝐵 𝑐 ) = 3. Which of the following is correct?
47 | P a g e
5
A. 𝑃(𝐴 ∪ 𝐵 𝑐 ) = 6
5
B. 𝑃(𝐴 ∪ 𝐵 𝑐 ) ≤ 6
1 1
A. 𝑓(𝑥, 𝑦) = 𝜋2 (1+𝑥 2)(1+𝑦 2) ; −∞ < (𝑥, 𝑦) < ∞
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
48 | P a g e
Question : 25
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from a population with density 𝑓(𝑥, 𝜃) =
𝜃−1
if 0<𝑥 < 1
{𝜃𝑥
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is a 100(1 − 𝑎)%
confidence interval for 𝜃?
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
A. [ , ]
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
B. [ , ]
−2∑𝑛
𝑖=1 ln 𝑋𝑖 −2∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
C. [ , ]
−2∑𝑛
𝑖=1 ln 𝑋𝑖 −2∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
D. [2∑𝑛 ,𝑛 ]
𝑖=1 ln 𝑋𝑖 2∑𝑖=1 ln 𝑋𝑖
Question : 26
Suppose that 𝑟 ball are drawn one at time without replacement from a bag containing n white
and m black balls. Let 𝑆𝑟 be the number of black balls drawn, then var (𝑆𝑟 )
is equal to
𝑚𝑛𝑟
A. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛+1)
𝑚𝑛𝑟
B. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛)
𝑚𝑛𝑟
C. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛−1)
𝑚𝑛𝑟
D. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚−𝑛)
Question: 27
Let 𝐹𝑛 be a sequence of DFs defined by
49 | P a g e
0, 𝑥<0
1
𝐹𝑛 (𝑥) = {1 − , 0 ≤ 𝑥 ≤ 𝑛
𝑛
1, 𝑛≤𝑥
and let lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥)
0, 𝑥<0
A. 𝐹(𝑥) = {
1, 𝑥≥0
C. 𝑋𝑛 converge in probability to 0
Question : 28
The Cumulative distribution function of a random variable 𝑋 is given by
0, 𝑥<2
1 7
𝐹(𝑥) = { (𝑥 2 − ) , 2≤𝑥<3
10 3
1, 𝑥≥3
50 | P a g e
Question: 29
Let 𝑋 and 𝑌 be two independent standard normal random variables. Then the probability
|𝑋|
density function of 𝑍 = |𝑌| is
√1/2
𝑓(𝑧) = { √𝜋 𝑒 −𝑧/2 𝑧 −1/2 if 𝑧 > 0
0, otherwise
2 −𝑧 2/2
𝑒 if 𝑧 > 0
𝑓(𝑧) = { √𝜋
0, otherwise
−𝑧
𝑒 if 𝑧 > 0
𝑓(𝑧) = {
0, otherwise
2 1
⋅ if 𝑧 > 0
𝑓(𝑧) = {√𝜋 (1 + 𝑧 2 )
0, otherwise
Question: 30
If the joint moment generating function of the random variables X and Y is
2 +18𝑡 2 +12𝑠𝑡)
M(s, t) = 𝑒 (𝑠+3𝑡+2𝑠
Which of the following is/are correct?
A. 𝐸(𝑋) < 𝐸(𝑌)
B. Corr (𝑋, 𝑌) > 0
C. Cov (𝑋, 𝑌) = 12
D. all of the above
3.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
3.6 GLOSSARY
Motivation: These Problems are very useful in real life and we can use it in data science ,
economics as well as social sciemce.
Attention: Think how the best estimator are useful in real world problems.
51 | P a g e
Answer 2 : B
Explanation:
If 𝐴 and 𝐵 be independent 𝑅𝑎𝑛𝑑𝑜𝑚 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒 each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵},
then
𝐸(𝑈) = 1/3, 𝐸(𝑉) = 2/3 and 𝑈𝑉 = 𝐴𝐵 and 𝑈 + 𝑉 = 𝐴 + 𝐵
Thus Cov (𝑈, 𝑉) = 𝐸(𝑈𝑉) − 𝐸(𝑈)
𝐸(𝑉) = 𝐸(𝐴𝐵) − 𝐸(𝑈)
1 2 1
E(V) = E(A) ⋅ E(B) − E(U) ⋅ E(V) = − =
4 9 36
Answer 3 : B
Explanation:
1
𝑋𝑖 ∼ 𝑈(0, 𝜃 2 ) 𝑓(𝑥) = ; 0 < 𝑥𝑖 < 𝜃 2
𝜃2
𝑋(3) ≤ 𝜃 2 ⇒ 𝜃ˆ ∈ [√𝑋(3) , ∞)
3
1
𝐿(𝑋, 𝜃) = ∏ 𝑓(𝑥𝑖 , 𝜃) =
𝜃6
𝑖=1
∂𝐿
⇒ ∂𝜃 < 0 there fore given function is decreasing then 𝜃ˆ = √𝑋(3)
Answer 4 : C
Explanation:
52 | P a g e
X −1 0 1
1 6 1
E(X) = −1 × + 0 × + 1 × = 0
8 8 8
1 6 1 1
E(𝑋 2 ) = 1 × + 0 × + 1 × =
8 8 8 4
1 1
𝑉(𝑋) = 𝐸(𝑋 2 ) − {𝐸(𝑋)}2 = ⇒ 𝜎𝑋 =
4 2
1
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4 [By Chebychev’s inequality]
Answer 5 : B
Explanation:
Let 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal
√2(X1 +X2 )
distribution. Then W = ∼ 𝑡(2)
√(X2 −X1 )2 +(Y2 −Y1 )2
53 | P a g e
Answer 7 : A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
distribution of 𝑋 free from parameter, then
1 1
𝜃ˆ = 𝜆 (𝑋(𝑛) − ) + (1 − 𝜆) (𝑋(1) + ) ; 0 < 𝜆 < 1
2 2
1 1 3
Take 𝜆 = 2 , 4 and 4 then we obtained mle of 𝜃 are
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
54 | P a g e
∞ ∞ 1
∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1 ⇒ 𝑘 =
𝜋2
1 1
Since 𝑋 and 𝑌 are independent, then 𝑋 ∼ 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
Answer 9 : D
Explanation:
Clearly, 𝑋1 , 𝑋2 , … , 𝑋𝑛
are i.i.d 𝐺(3,1) random variables. Then, 𝐸(𝑋𝑖 ) = 3 and Var (𝑋𝑖 ) = 3, 𝑖 = 1,2, …
Let 𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then E(𝑆𝑛 ) = 3𝑛 and Var (𝑆𝑛 ) = 3𝑛
𝑆𝑛 −3𝑛
Using CLT ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆 3𝑛 𝑆 3𝑛
lim𝑛→∞ 𝐸 ( 𝑛𝑛) = lim𝑛→∞ = 3; lim𝑛→∞ 𝑉 ( 𝑛𝑛) = lim𝑛→∞ 𝑛2 = 0
𝑛
𝑆
For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
55 | P a g e
(B) When there are more than two variables include, the observation lead to multinomial
distribution.
(𝑋, 𝑌) not follows Multinomial (2𝑛; 𝑝, 𝑝)
1
𝑃(𝑋 > 0) = 2
1 1 1
𝑃(𝑋 > 0)𝑃(𝑌 < 0) = × =
2 2 4
56 | P a g e
57 | P a g e
58 | P a g e
𝑋 ∼ 𝑈[𝜃, 𝜃 + 1] ⇒ 𝑓(𝑥) = 1; 𝜃 ≤ 𝑋𝑖 ≤ 𝜃 + 1
1 2𝜃+1 1
𝐸 (𝑋‾ − 2) = −2=𝜃
2
1
(𝑋‾ − 2) is an unbiased estimate of 𝜃
For option (B)
𝑋(1) + 𝑋(𝑛) 1 2𝜃 + 1 1
𝐸( − )= − =𝜃
2 2 2 2
𝑋(1) +𝑋(𝑛) 1
− 2 is an unbiased estimate of 𝜃
2
59 | P a g e
𝑆 1 𝑆
lim𝑛→∞ 𝐸 ( 𝑛𝑛) → 2 and lim𝑛→∞ 𝑉 ( 𝑛𝑛) → 0
𝑆𝑛 1
this implies → 2 in probability
𝑛
𝑆𝑛 1 𝑆𝑛 1
If → 2 in probability then → 2 in distribution
𝑛 𝑛
1
𝑆𝑛 −
2
Using CLT: 3
∼ 𝑁(𝟎, 1)as 𝐧 → ∞
√
4
𝑠𝑛 −𝐸(𝑆𝑛 ) 𝑛−𝐸(𝑆𝑛 )
𝐏( ≤ ) → 1 as 𝐧 → ∞
√𝑆𝑛 √𝑆𝑛
1 1
𝜃ˆ = 𝜆 (𝑋(𝑛) − ) + (1 − 𝜆) (𝑋(1) + )
2 2
1 𝑋(𝑛) +𝑋(1)
take 𝜆 = 2 ⇒ 𝜃ˆ = 𝑇𝑛 = and also
2
𝑋(𝑛) + 𝑋(1)
𝐸( )=𝜃
2
𝑋(𝑛) +𝑋(1)
By property of MLE𝑇𝑛 = 2
is consistent for 𝜃
From above 𝑇𝑛 are unbiased, consistent and also MLE for 𝜃
60 | P a g e
Answer 24 : C
Explanation:
Since sample multiple correlation lies between 0 to 1
0 ≤ 𝑟1.23,…,𝑛 ≤ 1
So option C is only hold this condition
Answer 26 : A
Explanation:
1, if the kth ball drawn is black
Let us define 𝑋𝑘 = { 𝑘 = 1,2, … , 𝑟
0, if the kth ball drawn is white
Then 𝑆𝑟 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑟
𝑚 𝑛
Also, P(𝑋𝑘 = 1) = 𝑚+𝑛′ , and P(𝑋𝑘 = 0) = 𝑚+𝑛
𝑚 𝑚𝑛
Thus E(𝑋𝑘 ) = 𝑚+𝑛 and V(𝑋𝑘 ) = (𝑚+𝑛)2
62 | P a g e
then jth and 𝑘th balls drawn are black, and = 0 otherwise.
𝑚 𝑚−1
Thus 𝐸(𝑋𝑗 , 𝑋𝑘 ) = 𝑃(𝑋𝑗 = 1, 𝑋𝑘 = 1) = 𝑚+𝑛 𝑚+𝑛−1
𝑚𝑛
and Cov (𝑋𝑗 , 𝑋𝑘 ) = (𝑚+𝑛)2(𝑚+𝑛−1)
𝑚𝑟 𝑚𝑛𝑟
Thus E(𝑆𝑟 ) = ∑𝑟𝐾=1 𝐸(𝑋𝑘 ) = 𝑚+𝑛 and 𝑉(𝑆𝑟 ) = (𝑚+𝑛)2 (𝑚+𝑛+1) (𝑚 + 𝑛 − 𝑟)
63 | P a g e
1 7 1
We have 𝐹(2) = 10 (4 − 3) = 6 and
𝐹(2− ) = 0. since 𝐹(2) ≠ 𝐹(2− ), the function 𝐹
is not continuous at 2
It is direct to see that 𝐹 is increasing in 𝑥 ∈ [2,3)
without any jump
1
P(X = 2) = F(2) − F(2− ) = 6
5
Since F is continuous at x = 5/2, we have P (X = 2) = 0, and therefore,
5
5 𝑃 (𝑋 = 2 , 2 ≤ 𝑋 ≤ 3)
𝑃 (𝑋 = ∣ 2 ≤ 𝑋 ≤ 3) =
2 𝑃(2 ≤ 𝑋 ≤ 3)
5
𝑃 (𝑋 = 2)
= =0
𝑃(2 ≤ 𝑋 ≤ 3)
2 1
⋅ if 𝑧 > 0
𝑓(𝑧) = {√𝜋 (1 + 𝑧 2 )
0, otherwise
B. Answer 30 : D
Explanation:
2 2
M(s, t) = 𝑒 (𝑠+3𝑡+2𝑠 +18𝑡 +12𝑠𝑡)
∂𝑀 ∂𝑀
𝐸(𝑌) = [ ] = 3; 𝐸(𝑋) = [ ] =1
∂𝑡 (0,0) ∂𝑠 (0,0)
∂2 𝑀
𝐸(𝑋𝑌) = [ ] = 15
∂𝑠 ∂𝑡 (0,0)
𝐸(𝑋) < 𝐸(𝑌)
Cov (𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋) ⋅ 𝐸(𝑌) = 15 − 1 × 3 = 12
Cov (𝑋, 𝑌) > 0 this implies Corr (𝑋, 𝑌) > 0
64 | P a g e
65 | P a g e
LESSON 4
METHODS OF POINT ESTIMATION
STRUCTURE
4.1 Learning Objectives
4.2 Introduction
4.3 Mathods of Point Estimation
4.3.1 Method of Moments
4.3.2 Method of Maximum Likelihood
4.4 In-Text Questions
4.5 Summary
4.6 Glossary
4.7 Answer to In-Text Questions
4.8 References
4.9 Suggested Readings
4.1 LEARNING OBJECTIVES
Point estimators are functions that are used to find an approximate value of a population
parameter from random samples of the population. They use the sample data of a population
to calculate a point estimate or a statistic that serves as the best estimate of an
unknown parameter of a population.
4.2 INTRODUCTION
Most often, the existing methods of finding the parameters of large populations are unrealistic.
For example, when finding the average age of kids attending kindergarten, it will be impossible
to collect the exact age of every kindergarten kid in the world. Instead, a statistician can use
the point estimator to make an estimate of the population parameter.
Properties of Point Estimators
The following are the main characteristics of point estimators:
1. Bias
The bias of a point estimator is defined as the difference between the expected value of the
estimator and the value of the parameter being estimated. When the estimated value of the
66 | P a g e
parameter and the value of the parameter being estimated are equal, the estimator is considered
unbiased.
Also, the closer the expected value of a parameter is to the value of the parameter being
measured, the lesser the bias is.
2. Consistency
Consistency tells us how close the point estimator stays to the value of the parameter as it
increases in size. The point estimator requires a large sample size for it to be more consistent
and accurate.
You can also check if a point estimator is consistent by looking at its corresponding expected
value and variance. For the point estimator to be consistent, the expected value should move
toward the true value of the parameter.
3. Most efficient and unbiased
The most efficient point estimator is the one with the smallest variance of all the unbiased and
consistent estimators. The variance measures the level of dispersion from the estimate, and the
smallest variance should vary the least from one sample to the other.
Generally, the efficiency of the estimator depends on the distribution of the population. For
example, in a normal distribution, the mean is considered more efficient than the median, but
the same does not apply in asymmetrical distributions.
Point Estimation and Interval Estimation
The two main types of estimators in statistics are point estimators and interval estimators. Point
estimation is the opposite of interval estimation. It produces a single value while the latter
produces a range of values.
A point estimator is a statistic used to estimate the value of an unknown parameter of a
population. It uses sample data when calculating a single statistic that will be the best estimate
of the unknown parameter of the population.
On the other hand, interval estimation uses sample data to calculate the interval of the
possible values of an unknown parameter of a population. The interval of the parameter is
selected in a way that it falls within a 95% or higher probability, also known as the confidence
interval.
The confidence interval is used to indicate how reliable an estimate is, and it is calculated from
the observed data. The endpoints of the intervals are referred to as the upper and lower
confidence limits.
4.3 METHODS OF POINT ESTIMATORS
The following are some of the methods of point estimators.
67 | P a g e
1. Method of Moments
2. Method of Maximum Likelihood
Now we shall now, briefly, explain these terms one by one.
4.3.1 Method of Moments :
The method of moments essentially amounts to equating sample moments and corresponding
population moments and solving the resulting equations for the parameters to be determined.
In most of the problems the parameter to be estimated is some known function of a given
(finite) number of population moments.
Suppose 𝐹(𝑥, 𝜃), 𝜃 ∈ Θ, (𝜃 may he vector valued i.e., 𝜃 = (𝜃1 , 𝜃2 , … . , 𝜃𝑘 ) is a distribution
function of a random variable 𝑋 with unknown parameter 𝜃. Let 𝑟 th moment about origin of
population be denoted by:
𝜇𝑟′ = 𝐸(𝑋 r ) 𝑟 = 1,2, … . , 𝑘
It is supposed to exist for 1 ≤ 𝑟 ≤ 𝑘.
Let 𝑋1 , 𝑋2 , … . , 𝑋𝑛 be a random sample of size 𝑛 from 𝐹(𝑥, 𝜃). Let the 𝑟 th sample moment
about origin be
𝑛
1
𝑚𝑟′ = ∑ 𝑋𝑖𝑟 , 𝑟 = 1,2, … , 𝑘
𝑛𝑖
𝑖=1
Equating the moments of population to the corresponding moments of the sample, we have
𝜇𝑟′ = 𝑚𝑟′ , 𝑟 = 1,2, … , 𝑘
The solution of these 𝑘 equations will give us the required estimators. We may also compare
the central moments of the population to the central moments of the sample to case the
problem.
Thus to estimate the 𝑘 parameters (𝜃1 , 𝜃2 , …𝑘 , 𝜃𝑘 ) we first obtain the mean (first moment about
origin) and the next (𝑘 − 1) moments (Central or Row).
In other words, let us suppose 𝜃 = 𝑔(𝜇1′ , 𝜇2′ , … , 𝜇𝑘′ ) where 𝑔 is some known numerical
function, then the muthod of moments consists in estimating 𝜃 by the statistic.
1 1 1
𝑇(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) = 𝑔 ( Σ𝑋𝑖 , Σ𝑋𝑙 , … , Σ𝑋𝑖 )
𝑛 𝑛 𝑛
′ ′ ′ ),
= 𝑔(𝑚1 , 𝑚2 , … . 𝑚𝑟 say.
Note 1. The number of equations, (𝜇𝑟′ = 𝑚𝑟′ ) is taken equal to the number of unknown
parameters. If we must estimate 𝑘 parameters (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) then we equate the population
and sample moments to obtain enough equations to provide the unique solutions for 𝜃𝑗 , 𝑗 =
1,2, 𝑘, where we may write.
68 | P a g e
According to the method of moments, we equate population and sample moment, i.e., we set
𝜇1′ = 𝑚1
which gives
𝜆 = 𝑋‾ ⇒ 𝜆ˆ = 𝑋‾
Hence 𝑋‾ is the required method of moments estimator for 𝜆.
8+9+10+12+15
Numerical. 𝜆ˆ = 𝑋‾ = = 10 ⋅ 8.
5
Example 2. Obtain the estimator for parameter 𝑝 in Binomial distribution by the method of
moments.
Solution. The probability mass function of Binomial distribution is given by
𝑛
𝑓(𝑥, 𝑛, 𝑝) = 𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 , 𝑥 = 0,1,2, … , 𝑛
The first moment about origin is
69 | P a g e
Let 𝑋1 , 𝑋2 , … . , 𝑋𝑛 be a random sample of size 𝑚 from 𝑓(𝑥; 𝑛, 𝑝). The corresponding sample
moment about origin is
𝑚
1
𝑚1′ = ∑ 𝑋𝑖 = 𝑋‾
𝑚
𝑖=1
or
1 2
𝜎2 = Σ𝑋 − 𝜇 2
𝑛 𝑖
Solving for 𝜇 and 𝜎 we have
and
70 | P a g e
𝜇ˆ = 𝑋‾
1
𝜎ˆ2 = Σ𝑋𝑖2 − 𝑋‾ 2
𝑛
𝑛
1
= ∑ (𝑋𝑖 − 𝑋‾)2
𝑛
𝑖=1
𝐿 gives the relative likelihood that the random variables assume a particular set of values
𝑥1 , 𝑥2 , … , 𝑥𝑛 , For a given sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 , 𝐿 becomes a function of the variable 𝜃, the
parameter.
The principle of maximum likelihood consists in finding an estimator for the unknown
parameter 𝜃 = (𝜃1 , 𝜃2 , … , 𝜃𝑘 ), say, which maximises the likelihood function 𝐿(𝜃) for
variations in parameter, i.e., we wish to find.
𝜃ˆ = (𝜃ˆ1 , 𝜃ˆ2 , … , 𝜃ˆ𝑘 ) so that
𝐿(𝜃ˆ) > 𝐿(𝜃) ∀𝜃 ∈ Θ, 𝑖. 𝑒, 𝐿(𝜃ˆ) = Sup 𝐿(𝜃)∀𝜃 ∈ Θ
Thus, if there exists a function 𝜃ˆ = 𝜃ˆ(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) of the sample values which maximises 𝐿
for variations in 𝜃, then 𝜃ˆ is to be taken as an estimator of 𝜃. 𝜃ˆ is usually called Maximum
Likelihood Estimator (M.L.E.). Thus 𝜃ˆ is the solution, if any, of
∂𝐿 ∂2 𝐿
= 0 and <0
∂𝜃 ∂𝜃 2
Since 𝐿 > 0, and log 𝐿 is a non-decreasing function of 𝐿; 𝐿 and log 𝐿 attain their extreme
values (maxima or minima) at the same value of 𝜃ˆ. The first of the two equations can be written
as
71 | P a g e
1 ∂𝐿 ∂log 𝐿
⋅ =0 ⇒ =0
𝐿 ∂𝜃 ∂𝜃
a form which is much more convenient from practical point of view.
If 𝜃 is vector valued parameter, then 𝜃ˆ = (𝜃ˆ1 , 𝜃ˆ2 , … , 𝜃ˆ𝑘 ), is given by the solution of
simultaneous equations:
∂ ∂
log 𝐿 = log 𝐿(𝜃1 , 𝜃2 , … , 𝜃𝑘 ) = 0; 𝑖 = 1,2, … , 𝑘
∂𝜃𝑖 ∂𝜃𝑖
The above equations are usually referred to as the Likelihood Equations for estimating the
parameters.
Properties of Maximum Likelihood Estimators. We make the following assumptions,
known as the Regularity Conditions :
∂log 𝐿 ∂2 log 𝐿
(i) The first and second order derivatives, viz., ∂𝜃 and ∂𝜃2 exist and are continuous
functions of 𝜃 in a range 𝑅 (including the true value 𝜃0 of the parameter) for almost all 𝑥. For
∂ ∂2
every 𝜃 in 𝑅, |∂𝜃 log 𝐿| < 𝐹1 (𝑥) and |∂𝜃2 log 𝐿| < 𝐹2 (𝑥) where 𝐹1 (𝑥) and 𝐹2 (𝑥) are
integrable functions over (−∞, ∞).
∂3 ∂3
(ii) The third order derivative ∂𝜃3 log 𝐿 exists such that |∂𝜃3 , log 𝐿| < 𝑀(𝑥), where
𝐸[𝑀(𝑥)] < 𝐾, a positive quantity.
(iii) For every 𝜃 in 𝑅,
∞
∂2 ∂2
𝐸 (− log 𝐿) = ∫ (− log 𝐿) 𝐿𝑑𝑥 = 𝐼(𝜃), is finite and non-zero.
∂𝜃 2 −∞ ∂𝜃 2
(iv) The range of integration is independent of 𝜃. But if the range of integration depends on 𝜃,
then 𝑓(𝑥, 𝜃) vanishes at the extremes depending on 𝜃.
This assumption is to make the differentiation under the integral sign valid.
Under the above assumptions M.L.E. possesses a number of important properties, which will
be stated in the form of theorems.
Theorem 1 (Cramer-Rao Theorem). "With probability approaching unity as 𝑛 → ∞, the
∂
likelihood equation ∂𝜃 log 𝐿 = 0, has a solution which converges in probability to the true
value 𝜃0 ". In other words M.L.E.'s are consistent.
Note. MLE's are always consistent estimators but need not be unbiased. For example in
sampling from 𝑁(𝜇, 𝜎 2 ) population.
MLE (𝜇) = 𝑥‾ (sample mean), which is both unbiased and consistent estimator of 𝜇.
MLE (𝜎 2 ) = 𝑠 2 (sample variance), which is consistent but not unbiased estimator of 𝜎 2 .
Theorem 2. (Hazoor Bazar's Theorem). Any consistent solution of the likelihood equation
72 | P a g e
provides maximum of the likelihood with probability tending to unity as the sample size (n)
tends to infinity.
Theorem 3 (ASYMPTOTIC NORMALITY OF MLE'S). A consistent solution of the
likelihood equation is asymptotically normally distributed about the true value 𝜃0 . Thus, 𝜃ˆ is
1
asymptotically 𝑁 (𝜃0 , 𝐼(𝜃 )), as 𝑛 → ∞,
0
1 1
Note. Variance of M.L.E. is given by : 𝑉(𝜃ˆ) = 𝐼(𝜃) = ∂2
{𝐸(− 2 log 𝐿}
∂𝜃
Theorem 4. If M.L.E. exists, it is the most efficient in the class of such estimators. Theorem
17.15. If a sufficient estimator exists, it is a function of the Maximum Likelihood Estimator.
Proof. If 𝑡 = 𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) is a sufficient estimator of 𝜃, then Likelihood Function can be
written as
𝐿 = 𝑔(𝑡, 𝜃)ℎ(𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ∣ 𝑡), where 𝑔(𝑡, 𝜃) is the density function of 𝑡 and 𝜃 and
ℎ(𝑥1 , 𝑥2 , … , 𝑥𝑛 ∣ 𝑡) is the density function of the sample, given 𝑡, and is independent of 𝜃.
∴ log 𝐿 = log 𝑔(𝑡, 𝜃) + log ℎ(𝑥1 , 𝑥2 , … , 𝑥𝑛 ∣ 𝑡)
∂log 𝐿 ∂
Differentiating w.r.t to 𝜃, we get: = ∂𝜃 log 𝑔(𝑡, 𝜃) = 𝜓(𝑡, 𝜃), (say),
∂𝜃
73 | P a g e
1 1 𝑥 −𝜇 2 1 1 𝑥𝑗 −𝜇 2
− ( 1 )
𝑒 2 𝜎 )
− (
𝐿 = 𝑒 2 𝜎
….
𝜎√2𝜋 𝜎√2𝜋
𝑛 𝑛
1 −
1
=( ) 𝑒 2𝜎2 ∑ (𝑥𝑖 − 𝜇)2
𝜎√2𝜋 𝑖=1
74 | P a g e
𝐿(𝑝) = ∏ 𝑓(𝑥𝑖 , 𝑝)
𝑖=1
𝑚
𝑛
=∏ 𝐶𝑥𝑖 𝑝 𝑥𝑖 𝑞 𝑛−𝑥𝑖
𝑖=1
𝑛
75 | P a g e
Example 3. Find the M.L.E. for the parameter 𝜆 of a Poisson distribution from 𝑛 = 6 sample
values, where the six observed values are 2,8,0,6,2 and 3.
Differentiating w.r.t. 𝜆 and setting the derivative equal to zero, we get the likelihood equation
Example 4. Let 𝑋 be a random variable with the probability density function 𝑓(𝑥, 𝛽) = (𝛽 +
1)𝑥 𝛽 for 0 < 𝑥 < 1. 𝛽 > −1. Obtain the m.l.e. of 𝛽 based on a sample 𝑋1 , … . , 𝑋𝑛 from
𝑓(𝑥, 𝛽). State the invariance property of the m.l.e. and use it to write the m.l.e. of 2𝛽 2 + 1.
Solution. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 denote a random sample from this distribution. Then the
likelihood function is
76 | P a g e
𝛽 𝛽 𝛽
𝐿(𝛽) = (1 + 𝛽)𝑥1 ⋅ (1 + 𝛽)𝑥2 … (1 + 𝛽)𝑥𝑛
𝑛
𝛽
= (1 + 𝛽)𝑛 ∏ 𝑥𝑖
𝑖=1
= 𝑛log (1 + 𝛽) + 𝛽 ∑ log 𝑥𝑖
𝑖=1
Differentiating w.r.t. 𝛽 and setting the derivative equal to zero, we get the likelihood
equation
𝑛
∂log 𝐿(𝛽) 𝑛
= + ∑ nog 𝑥𝑖 = 0
∂𝛽 1+𝛽
𝑖=1
𝑛
or 𝑛 + (1 + 𝛽) ∑ log 𝑥𝑖 = 0
𝑖=1
𝑛 + Σlog 𝑥𝑖 1
𝛽ˆ = − ( ) = − (1 + )
∑ log 𝑥𝑖 log 𝐺
Since 0 ≤ 𝑥 ≤ 1, log 𝑥𝑖 ≤ 0 for all 𝑖 and 𝑛 > Σlog 𝑥𝑖 .
where 𝐺 is the geometric mean of the sample values.
4.4 IN-TEXT QUESTIONS
Self Assesement questions: MCQ’s Problems
Question: 1
Let 𝑋 and 𝑌 be two independent 𝑁(0,1) random variables. Then 𝑃(0 < 𝑋 2 + 𝑌 2 < 4) equals
A. 1 − 𝑒 −2
B. 1 − 𝑒 −4
C. 1 − 𝑒 −1
D. 𝑒 −2
77 | P a g e
Question: 2
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 denote a random sample from a normal distribution with
(𝑋𝑖 −𝑋‾)2
variance 𝜎 2 > 0. If the first percentile of the statistic 𝑊 = ∑𝑛𝑖=1 is 1.24 and
𝜎2
𝑃(𝜒72 ≤ 1.24) = 0.01 and 𝑃(𝜒72 > 1.24) = 0.99, where 𝑋‾ denotes the sample mean, what is
the sample size n ?
A. 7
B. 8
C. 6
D. 5
Question: 3
Consider the sample linear regression model 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
Where 𝜖𝑖′ 𝑠 are i.i. d random variables with mean 0 and variance 𝜎 2 ∈ (0, ∞)
Suppose that we have a data set
(𝑥1 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) with n = 20,
∑𝑛𝑖=1 𝑥𝑖 = 100, ∑𝑛𝑖=1 𝑦𝑖 = 50, ∑𝑛𝑖=1 𝑥𝑖2 = 600, ∑𝑛𝑖=1 𝑦𝑖2 = 500 and ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 400
Then the least square estimates of 𝛼 and 𝛽 are respectively,
A. 5 and 3/2
B. −5 and 3/2
C. 5 and −3/2
D.-5 and −3/2
Question: 4
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is random sample from a population with density
𝜃−1
if 0<𝑥 < 1
𝑓(𝑥, 𝜃) = {𝜃𝑥
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is 100(1 − 𝛼)% confidence interval for 𝜃 ?
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
A. [ , ]
2 ∑𝑛
𝑖=1 ln 𝑋𝑖 2 ∑𝑛
𝑖=1 ln 𝑋𝑖
78 | P a g e
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
B. [ , ]
−2 ∑𝑛
𝑖=1 ln 𝑋𝑖 −2 ∑𝑛
𝑖=1 ln 𝑋𝑖
2
C. 𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
[ , ]
−2 ∑𝑛
𝑖=1 ln 𝑋𝑖 −2 ∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
D. [2 ∑𝑛 , ]
𝑖=1 ln 𝑋𝑖 2 ∑𝑛
𝑖=1 ln 𝑋𝑖
Question: 5
Let 𝑋1 , … , 𝑋𝑛 be a random sample from a 𝑁(2𝜃, 𝜃 2 ) population, 𝜃 > 0. A consistent
estimator for 𝜃 is
1
A. ∑𝑛𝑖=1 𝑋𝑖
𝑛
5 1/2
B. (𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
1
C. ∑𝑛𝑖=1 𝑋𝑖2
5𝑛
1 1/2
D. (5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
Question: 6
What is the arithmetic mean of the data set: 4, 5, 0, 10, 8, and 3?
A. 4
B. 5
C. 6
D. 7
Question: 7
Which of the following cannot be the probability of an event?
A. 0.0
B. 0.3
C. 0.9
D. 1.2
79 | P a g e
Question: 8
If a random variable X has a normal distribution, then eX has a _____ distribution.
A. lognormal
B. exponential
C. poisson
D. binomial
Question : 9
What is the geometric mean of: 1, 2, 8, and 16?
A. 4
B. 5
C. 6
D. 7
Question: 10
Which test is applied to Analysis of Variance (ANOVA)?
A. t test
B. z test
C. F test
D. χ2 test
Question : 11
The arithmetic mean of all possible outcomes is known as
A. expected value
B. critical value
C. variance
D. standard deviation
Question: 12
Which of the following cannot be the value of a correlation coefficient?
80 | P a g e
A. –1
B. –0.75
C. 0
D. 1.2
Question: 13
Var (X) = ?
A. E[X2]
B. E[X2] – E[X]
C. E[X2] + E[X]2
D. E[X2] – E[X]2
Question: 14
Var (X + Y) = ?
A. E[X/Y] + E[Y]
B. E[Y/X] + E[X]
C. Var(X) + Var(Y) + 2 Cov(X, Y)
D. Var(X) + Var(Y) – 2 Cov(X, Y)
Question: 15
What is variance of the data set: 2, 10, 1, 9, and 3?
A. 15.5
B. 17.5
C. 5.5
D. 7.5
Question: 16
In a module, quiz contributes 10%, assignment 30%, and final exam contributes 60% towards
the final result. A student obtained 80% marks in quiz, 65% in assignment, and 75% in the
final exam. What are average marks?
A. 64.5%
B. 68.5%
81 | P a g e
C. 72.5%
D. 76.5%
Question: 17
In a university, average height of students is 165 cm. Now, consider the following Table,
HEIGHT 160-162 162-164 164-166 166-168 168-170
STUDENTS 16 20 24 20 16
Question: 18
What is the average of 3%, 7%, 10%, and 16% ?
A. 8%
B. 9%
C. 10%
D. 11%
Question: 19
The error of rejecting the null hypothesis when it is true is known as
A. Type-I error
B. Type-II error
C. Type-III error
D. Type-IV error
Question: 20
The mean and variance of Poisson distribution with parameter lamda are both
82 | P a g e
A. 0
B. 1
C. λ
D. 1/λ
Questions : 21
Which of the following statements is(are) TRUE?
A. The marginal distribution of 𝑋 is Poisson with mean 1/2
B. The random variable 𝑋 and 𝑌 are independent
1
C. The conditional distribution of X given Y = 5 is Bin (6, 2)
D. 𝑃(𝑌 = 𝑛) = (𝑛 + 1)𝑃(𝑌 = 𝑛 + 2) for 𝑛 = 0,1,2, …
Question 22.
Consider the trinomial distribution with the probability mass function
2! 1 𝑥 2 𝑦 3 2−𝑥−𝑦
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ( ) ( ) ( )
𝑥! 𝑦! (2 − 𝑥 − 𝑦)! 6 6 6
, 𝑥 ≥ 0, 𝑦 ≥ 0, and 0 < 𝑥 + 𝑦 ≤ 2. Then Corr (𝑋, 𝑌) is equal to…
(correct up to two decimal places)
A) -0.31
B) 0.31
C) 0.35
D) 0.78
Question 23.
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2 be the observed values of a random sample of size
four from a distribution with the probability density function
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
Then the maximum likelihood estimate of 𝜃 2 + 𝜃 + 1 is equal (up to decimal place).
A) 1.75
B) 1.89
83 | P a g e
C) 1.74
D) 0.87
Question : 24
Let 𝑈 ∼ 𝐹5,8 and 𝑉 ∼ 𝐹8,5. If 𝑃[𝑈 > 3.69] = 0.05, then the value of C such that
𝑃[𝑉 > 𝑐] = 0.95 equals… (round off two decimal places)
A) 0.27
B) 1.27
C) 2.27
D) 2.29
Question 25.
Let P be a probability function that assigns the same weight to each of the points of the sample
space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then which of
the following statement(s) is (are) TRUE?
1. E and F are independent
2. E and G are independent
3. E, F and G are independent
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. 1 and 2 only
D. 1,2 and 3
Question : 26
Let 𝑋1 , 𝑋2 , … , 𝑋4 and 𝑌1 , 𝑌2 , … , 𝑌5 be two random samples of size 4 and 5 respectively,
from a standard normal population. Define the statistic
5 𝑋12 + 𝑋22 + 𝑋32 + 𝑋42
T=( ) 2
4 𝑌1 + 𝑌22 + 𝑌32 + 𝑌42 + 𝑌52
then which of the following is TRUE?
A. Expectation of 𝑇 is 0.6
B. Variance of T is 8.97
84 | P a g e
Question : 30
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
Which of the following statements is (are) TRUE?
3
A. 𝑃(1 < 𝑋 < 2) = 10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
86 | P a g e
(x) Minimum variance unbiased estimators are unique under certain general
conditions. (True)
(xi) The minimum variance unbiased estimator for 𝜃 does not exist in the
Cauchy distribution
1 1
𝑓(𝑥, 𝜃) = 𝜋 ⋅ 1+(𝑥−𝜃)2 , −∞ ≤ 𝑥 ≤ ∞ (True)
(v) The importance of the method of minimum variance over other methods is that
it gives also then variance of T is …… (Ans. Lambda) [Ans. Variance]
3. In each of the following questions, four alternative answers are given in which
only one is correct. Select the correct answer and write the letter (a), (b) (c) or (d):
(i) The method of moments for determining point estimators of the population
parameters was discovered by
(a) Karl Pearson
(b) R.A. Fisher
(c) Cramer-Rao
(d) Rao-Blackwell
Ans. (a)
2
(ii) Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from 𝑓(𝑥, 𝛽) = 𝛽2 (𝛽 − 𝑥), 𝛼 ≤ 𝑥 ≤ 𝛽.
87 | P a g e
88 | P a g e
(a) 𝑙<0
(b) 𝑙>0
(c) 𝑙=0
(d) 𝑙 ≠ 0 Ans. (a)
(vii) The necessary and sufficient condition for the existence of minimum variance
∂log 𝐿
unbiased estimator 𝑇 of 𝜓(𝜃) is = 𝑘(𝜃, 𝑛)[𝑇 − 𝜓(𝜃)]
∂𝜃
89 | P a g e
1
𝑓(𝑥) = {0 ; 0 < 𝑥 < 0
0 otherwise
The minimum varianse unbiased estimator for 𝜃 is.
(a) Max (𝑋1 , 𝑋2 , … … , 𝑋𝑛 )
𝑋1 +𝑋2 +⋯+𝑋𝑛
(b) 𝑛
(a) 𝜃ˆ = 𝑓(𝑋1 , 𝑋2 , … … , 𝑋𝑛 )
(b) 𝜃ˆ is a function of 𝑡
(c) 𝐵ˆ is independent of 𝑡
90 | P a g e
Ans. (b)
(xiv) A minimum variance unbiased estimator is said to be unique if for any other
estimator 𝑇𝑛′
(a) Var (𝑇𝑛 ) = Var (𝑇𝑛 )
(b) Var (𝑇𝑛 ) ≤ Var (𝑇𝑛 )
(c) both (a) and (b)
(d) neither (a) nor (b)
Ans. (a)
4.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
4.6 GLOSSARY
Motivation: These Problems are very useful in real life and we can use it in data science ,
economics as well as social sciemce.
Attention: Think how the Methods of Estimation are useful in real world problems.
4.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1 : A
Explanation:
2 2
Since 𝑋 2 + 𝑌 2 ∼ 𝜒(2) , we know that 𝜒(2) random variable is the same as that of the
exponential random variable with mean 2 and therefore, we have
4 1 −𝑡/2
𝑃(0 < 𝑋 2 + 𝑌 2 < 4) = ∫0 𝑒 𝑑𝑡 = 1 − 𝑒 −2
2
91 | P a g e
1 𝑛
(𝑋𝑖 − 𝑋‾)2 2
= 𝑃(𝑊 ≤ 1.24) = 𝑃 (∑𝑖=1 2
≤ 1.24) = 𝑃(𝜒𝑛−1 ≤ 1.24)
100 𝜎
Thus from the given value 𝑃(𝜒72 ≤ 1.24) = 0.01
we get
n−1=7
and hence the sample size n is 8 .
Answer 3 : B
Explanation:
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
5
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛𝑥‾𝑦‾ 400 − 20 × 5 × 2 150 3 5 3 10
𝛽ˆ = 𝑖 2
= 2
= = ; 𝛼
ˆ = 𝑦
‾ − 𝑥
‾𝛽ˆ = −5× =−
∑𝑖=1 𝑥𝑖 − 𝑛𝑥‾ 2 600 − 20 × 5 100 2 2 2 2
= −5
Hence option B is correct.
Answer 4 : C
Explanation:
We use the random variable 𝑄 = −2𝜃∑𝑛𝑖=1 ln 𝑋𝑖 ∼ 𝜒(2𝑛)
2
Answer 5 : D
Explanation:
2
We have 𝐸(𝑋𝑖2 ) = 𝑉(𝑋𝑖 ) + (𝐸(𝑋𝑖 )) = 5𝜃 2 ; 𝑖 = 1,2, … , 𝑛
𝑥12 𝑥22
Then , ,…
5 5
𝑥2
is a sequence of i.i. d random variables with 𝐸 ( 51 ) = 𝜃 2 . Using WLLN, we get
92 | P a g e
1 1
∑𝑛 𝑋 2 /5 = 5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 converge in probability to 𝜃 2
𝑛 𝑖=1 𝑖
1 1/2
as 𝑛 → ∞, which implies that (5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
1 1/2
converge in probability to 𝜃 as 𝑛 → ∞. Thus (5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 ) is a consistent estimator for 𝜃.
Hence option D is the correct choice.
Answer 6 : B
Explanation :
Here total numbers are 6
4+5+0+10+8+3 30
So, 𝐴𝑀 = = =5
6 6
Answer 7 : D
Explanation :
The probability of an event is always between 0 and 1 (including 0 and 1 ). So, 1.2 cannot be
the probability of an event.
Answer 8 : A
Explanation :
A lognormal distribution is a probability distribution of a random variable whose logarithm is
normally distributed. So, if X is lognormal then Y = ln (X) is normal. Similarly, if Y is
normal then X = eY is lognormal.
Answer 9 : A
Explanation :
There are 4 numbers in total. So, by using the formula for calculating geometric mean, we
have
𝐺. 𝑀 = (1 × 2 × 8 × 16)1/4
= (256)1/4
= (44 )1/4 ∵ 44 = 256
=4
93 | P a g e
Answer 10 : C
In anova we use F test because we use ratio of two chi square statistics .
Answer 11 : A
Explanation :
Expectation (or expected value) is the arithmetic mean of all possible outcomes of a random
variable.
Answer 12 : D
Explanation :
The value of a correlation coefficient is always between −1 and 1, including −1 and 1.
Answer 13 : D
Explanation :
By definition, Var (𝑋) = 𝐸[𝑋 2 ] − 𝐸[𝑋]2
Answer 14 : C
Explanation :
By definition, Var (𝑋 + 𝑌) = Var (𝑋) + Var (𝑌) + 2Cov (𝑋, 𝑌)
Answer 15 : B
Explanation :
First calculate the mean
2 + 10 + 1 + 9 + 3 25
Mean = = =5
5 5
Now calculate the variance,
(2 − 5)2 + (10 − 5)2 + (1 − 5)2 + (9 − 5)2 + (3 − 5)2
Variance =
5
= 17.5
Answer 16 : C
Explanation :
By using formula for calculating weighted average, we have
Weighted Average = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3
= 0.1 × 0.8 + 0.3 × 0.65 + 0.6 × 0.75
= 0.725 = 72.5%
94 | P a g e
Answer 17 : A
Explanation :
Answer 18 : B
Explanation :
3%, 7%, 10%, 16%
3 + 7 + 10 + 16
Average = %
4
36
= % = 9%
4
Answer 19 : A
Explanation :
Type I error = P(reject H0| when H0 is true )
Answer 20 : C
Explanation :
We know that if random variable x follows Poisson distribution with parameter lamda then
E(X) = V(X)= lamda
Question 21.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
95 | P a g e
Answer 21 : A
Explanation :
The marginal probability mass function of X is given by
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
1 1 𝑚
𝑒 − 2(2)
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞ 𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
−1
𝑒
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 22 : 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
2 2
= 2 × 6̅ (1 − 6) = 16/36
96 | P a g e
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
Answer 23 : 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
Answer 24 : 𝐀
Explanation :
1
X ∼ 𝐹(𝑚, 𝑛) then x ∼ 𝐹(𝑛, 𝑚)
𝑃[𝑈 > 3.69] = 0.05 ⇒ 1 − 𝑃[𝑈 > 3.69] = 1 − 0.05
⇒ 𝑃[𝑈 < 3.69] = 0.95
1 1 1
⇒ 𝑃 [𝑈 > 3.69] = 0.95 ⇒ 𝑉 = 𝑈 and
1
𝑐= = 0.27
3.69
Hence c = 0.27 is the correct answer.
Answer 25 : C
Explanation :
Clearly, P({𝜔}) = 1/4 ∀𝜔 ∈ Ω = {1,2,3,4}. We have E = {1,2}, F = {1,3} and G = {3,4}
97 | P a g e
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
1 1 𝑘−1
𝐸(𝑌) = ∑∞𝑘=0 𝑘 ( ) =2
2 2
Hence option B is correct.
Answer 30 : A
Explanation :
3
𝑃(1 < 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) =
10
3
𝑃(1 < 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) =
5
1
𝑃(1 ≤ 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) =
2
4
𝑃(1 ≤ 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) + 𝑃(𝑋 = 1) =
5
4.8 REFERENCES
• Devore, J. (2012). Probability and statistics for engineers, 8th ed. Cengage Learning.
• John A. Rice (2007). Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Larsen, R., Marx, M. (2011). An introduction to mathematical statistics and its
applications. Prentice Hall.
• Miller, I., Miller, M. (2017). J. Freund’s mathematical statistics with applications, 8th
ed. Pearson.
• Demetri Kantarelis, D. and Malcolm O. Asadoorian, M. O. (2009). Essentials of
Inferential Statistics, 5th edition, University Press of America.
4.9 SUGGESTED READINGS
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
99 | P a g e
LESSON 5
CRAMER RAO INEQUALITY
STRUCTURE
5.1 Learning Objectives
5.2 Introduction
5.3 Cramer Rao Inequality
5.3.1 Simple form of C-R Inequality
5.3.2 Regularity Condition
5.3.3 Alternative form of C-R Inequality
5.3.4 Equality Sign in C-R Inequality
5.3.5 Uses of C-R Inequality
5.4 In-Text Questions
5.5 Summary
5.6 Glossary
5.7 Answer to In-Text Questions
5.8 References
5.9 Suggested Readings
5.1 LEARNING OBJECTIVES
The Cramér–Rao inequality gives a lower bound for the variance of an unbiased estimator of
a parameter. It is named after work by Cramér (1946) and Rao (1945). The inequality and the
corresponding lower bound in the inequality are stated for various situations.
5.2 INTRODUCTION
Point estimation is the use of a statistic to estimate the value of some parameter of a population
having a particular type of density. The statistic we use is called the point estimator and its
value is the point estimate. A desirable property for a point estimator T for a parameter 𝜃 is
that the expected value of T is 𝜃. If T is a random variable with density 𝑓 and values 𝜃ˆ, this is
equivalent to saying
∞
𝔼[𝑇] = ∫ 𝜃ˆ𝑓(𝜃ˆ)𝑑𝜃ˆ = 𝜃
−∞
100 | P a g e
Here, 𝜇T̂ is the mean of the random variable Θ̂, which is 𝜃 in the case of an unbiased estimator.
Choosing the estimator with the smaller variance is a natural thing to do, but by no means is it
the only possible choice. If two estimators have the same expected value, then while their
average values will be equal the estimator with greater variance will have larger fluctuations
about this common value.
An estimator with a smaller variance is said to be relatively more efficient because it will tend
to have values that are concentrated more closely about the correct value of the parameter;
thus, it allows us to be more confident that our estimate will be as close to the actual value as
we would like. Furthermore, the quantity
var T̂1
var T̂2
is used as a measure of the efficiency of T̂2 relative to T̂1 . We hope to maximize efficiency by
minimizing variance.
In our example, the mean of the population has variance 𝜎 2 /𝑚 = 𝜎 2 /(2𝑛 + 1). If the
population median is 𝜇˜, that is 𝜇˜ is such that.
𝜇˜
1
∫ 𝑓(𝑥; 𝜇, 𝜎 2 )𝑑𝑥 =
−∞ 2
then, the sampling distribution of the median is approximately normal with mean 𝜇˜ and
variance.
1
8𝑛 ⋅ 𝑓(𝜇˜)2
101 | P a g e
Since the normal distribution of our example is symmetric, we must have 𝜇˜ = 𝜇, which makes
it easy to show that 𝑓(𝜇˜) = 1/√2𝜋𝜎 2 . The variance of the sample median is therefore
𝜋𝜎 2 /4𝑛.
Certainly, in our example, the mean has the smaller variance of the two estimators, but we
would like to know whether an estimator with smaller variance exists. More precisely, it would
be very useful to have a lower bound on the variance of an unbiased estimator. Clearly, the
variance must be non-negative 1 , but it would be useful to have a less trivial lower bound. The
Cramér-Rao Inequality is a theorem that provides such a bound under very general conditions.
It does not, however, provide any assurance that any estimator exists that has the minimum
variance allowed by this bound.
5.3 CRAMER-RAO INEQUALITY
The following are some of the main points about Cramer- Rao Inequality.
1. Simple form of C-R inequality
2. Regularity Condition
3. Alternative form of C-R Inequality
4. Equality Sign in C-R Inequality
5. Uses of C-R Inequality
102 | P a g e
∂𝐿
2. for almost all 𝑥1 , 𝑥2 , … . , 𝑥𝑛 (and for all 𝜃 ∈ Θ) ∂𝜃 e , ts, the exceptional set if any being
independent of 𝜃 (the case of random sampling from 𝑅(0, 𝜃), for example, is excluded).
That is the range of 𝑥 is independent of 𝜃.
III. Differentiation under the sign of integral is valid. This means among other things the
domain of positive pdf does not depend upon 𝜃. Thus
∂ ∂𝐿
(i) ∫𝐴 𝐿𝑑𝑥1 𝑑𝑥2 … 𝑑𝑥𝑛 = ∫𝐴 ∂𝜃 𝑑𝑥1 𝑑𝑥2 … 𝑑𝑥𝑛 where A denotes the domain of
∂𝜃
positive pdf.
∂ ∂𝐿
(ii) ∫ 𝑡𝐿𝑑𝑥1 𝑑𝑥2 … 𝑑𝑥𝑛 = ∫ 𝐴 𝑡 ∂𝜃 𝑑𝑥1 𝑑𝑥2 … . 𝑑𝑥𝑛 (This condition makes the
∂𝜃 𝐴
result applicable to certain class of estimates.)
∂log 𝐿 2
(iii) First two derivatives of 𝐿 with respect to 𝜃 exist. L.e. 𝐸 [ ] or
∂𝜃
2
∂2 log 𝐿
(iv) [ ∂𝜃2 ] exisis and is positive for every 𝜃 ∈ Θ. In other words, the Regularity
conditions are:
1 Range of the distribution is independent of 𝜃.
2 First two derivatives of 𝐿 with respect to 𝜃 exist.
3 Conditions of uniform convergence are fulfilled so that differentiation
under the sign of integral is valid.
Proof.
Proof. We have
1 = ∫𝐴 𝐿𝑑𝑥1 𝑑𝑥2 … . . 𝑑𝑥𝑛 for all 𝜃 ∈ Θ …..(1)
Differentiating (1) with respect to 𝜃 both sides, we get
∂
0 =
∫ 𝐿𝑑𝑥1 𝑑𝑥2 … . . 𝑑𝑥𝑛
∂𝜃 𝐴
∂𝐿
⇒ 0 =∫ 𝑑𝑥1 𝑑𝑥2 … . . 𝑑𝑥𝑛 (Regularity condition)
𝐴 ∂𝜃
1 ∂𝐿
⇒ 0 =∫ ⋅ 𝐿𝑑𝑥1 𝑑𝑥2 … . . 𝑑𝑥𝑛
𝐴 𝐿 ∂𝜃
(Divide and multiply by 𝐿 )
103 | P a g e
∂log 𝐿
⇒ 0=∫ ⋅ 𝐿𝑑𝑥1 𝑑𝑥2 … . . 𝑑𝑥𝑛
𝐴 ∂𝜃
∂log 𝐿
⇒ 0 = 𝐸[ ]
∂𝜃
104 | P a g e
Note 1. R.H.S. in (5) gives the lower bound of the variance of 𝑇 and is sometimes known as
Minimum Variance Bound (𝑀𝑉𝐵).
∂log 𝐿 2
Note 2. 𝐼𝑛 (𝜃) or 𝐼(𝜃) or 𝐼 = 𝐸 ( ∂𝜃 ) has been called by Fisher, the amount of
information about 𝜃 in the ratiom sample and its reciprocal the information limit of variance
of 𝑇.
∂log 𝐿 2 ∂2 log 𝐿
Note 3. 𝐸 [ ] = −𝐸 ( )
∂𝜃 ∂𝜃2
Under the assumption that 1 = ∫𝐴 𝐿𝑑𝑥1 , 𝑑𝑥2 … . . 𝑑𝑥𝑛 = ∫𝐴 𝐿𝑑𝑥, say
∂log 𝐿
is differentiable with respect to 𝜃 under the integral sign twice, we get. 0 = ∫𝐴 𝐿𝑑𝑥
∂𝜃
∂𝐿
∂2 log 𝐿 ∂log 𝐿 ] 𝑑𝑥
∂𝜃
⇒ 0 = ∫𝐴 [ 2
𝐿+
∂𝜃 ∂𝜃
∂2 log 𝐿 ∂log 𝐿 2
⇒ 0 = ∫𝐴 [ 𝐿 + ∫𝐴 ( ) ⋅ 𝐿] 𝑑𝑥
∂𝜃 2 ∂𝜃
∂log 𝐿 2 ∂𝐿 ∂log 𝐿 ∂2 log 𝐿
⇒ 0 = ∫𝐴 ( ) 𝐿 = ⋅ 𝐿} 𝐿𝑑𝑥
∂𝜃 ∂𝜃 ∂𝜃 ∂𝜃 2
∂log 𝐿 2 ∂2 log 𝐿
⇒ 0 = 𝐸( ) +𝐸( )
∂𝜃 ∂𝜃 2
∂log 𝐿 2 ∂2 log 𝐿
⇒ 𝐸( ) = −𝐸 ( )
∂𝜃 ∂𝜃 2
∂log 𝐿 2 ∂log 𝑓(𝑥, 𝜃) 2
Note 4. 𝐸 ( ) = 𝑛𝐸 ( ]
∂𝜃 ∂𝜃
𝑛 2
∂log Π𝑖=1 𝑓(𝑥𝑖 , 𝜃)
L.H.S. = 𝐸 ( )
∂𝜃
2
∂ ∑𝑛𝑖=1 log 𝑓(𝜘𝑖 , 𝜃)
= 𝐸[ )
∂𝜃
𝑛 2
∂log 𝑓(𝑥𝑖 , 𝜃) ∂log 𝑓(𝑥𝑖 , 𝜃) ∂log 𝑓(𝑥𝑗 , 𝜃)
= 𝐸 [∑ { } +∑ { ⋅ }]
∂𝜃 ∂𝜃 ∂𝜃
𝑖=1 𝑖≠𝑗
𝑛 2
∂log 𝑓(𝑥, 𝜃)
= ∑ 𝐸[ ] +0
∂𝜃
𝑖=1
105 | P a g e
∂log 𝑓(𝑥,𝜃)
{∵ 𝑥1 , 𝑥2 , … . , 𝑥𝑛 are independent and identically distributed and 𝐸 ( ) = 0}
∂𝜃
∂log 𝑓(𝑥, 𝜃) 2
= 𝑛𝐸 [ ] = R.H.S.
∂𝜃
∂log 𝐿 2 ∂2 log 𝑓(𝑥,𝜃)
Remark 5. Since 𝐸 ( ) = −𝑛𝐸 [ ] and
∂𝜃 ∂𝜃2
Note 5. The minimum variance bound for the variance of an unbiased estimator of 𝜃 is given
by
1
Var (𝑇) ≥
∂log 𝐿 2
𝐸( )
∂𝜃
1
Var (𝑇) ≥
∂2 log 𝐿
−𝐸 ( )
∂𝜃 2
1
Var (𝑇) ≥
∂log 𝑓(𝑥, 𝜃) 2
𝑛𝐸 ( )
∂𝜃
1
Var (𝑇) ≥ 2
∂ log 𝑓(𝑥, 𝜃)
−𝑛𝐸 ( )
∂𝜃 2
106 | P a g e
∂𝛾(𝜃)
𝛾 ′ (𝜃) =
∂𝜃
and
𝐿 = ∏𝑛𝑖=1 𝑓(𝑥𝑖 , 𝜃), joint 𝑝𝑑𝑓 of (𝑥1 , 𝑥2 , … . , 𝑥𝑛 )
107 | P a g e
∂log 𝐿
Note 1. 𝐸 ( ∂𝜃 ) = 0 ⇒ 𝐸(𝑇) = 𝛾(𝜃;
Note 2. in other form, we can write.
∂log 𝐿 𝑇 − 𝛾(𝜃)
=
∂𝜃 𝜆(𝜃)
∂log 𝐿
or = 𝑘(𝜃, 𝑛)(𝑇 − 𝛾(𝜃))
∂𝜃
where 𝜆(𝜃) or 𝑘(𝜃, 𝑛) or 𝐾(𝜃) is a constant depending on 𝜃 independent of (𝑥1 , 𝑥2 , … . , 𝑥𝑛 ),
but may depend on 𝑛.
∂log 𝐿 ∂log ∏𝑛𝑖=1 𝑓(𝑥𝑖 , 𝜃
=
∂𝜃 ∂𝜃
∂ ∑𝑛𝑖=1 log 𝑓(𝑥𝑖 , 𝜃)
Note 3. =
∂𝜃1
𝑛
∂log 𝑓(𝑥𝑖 , 𝜃)
=∑ = 𝐴(𝜃)[𝑇 − 𝛾(𝜃)]
∂𝜃
𝑖=1
108 | P a g e
|𝛾 ′ (𝜃)|2
Var (𝑇) =
Var (𝑇)/(𝜆(𝜃)|2
⇒ [Var (𝑇)]2 = [(𝛾 ′ (𝜃)]2 [𝜆(𝜃)]2
∴ Var (𝑇) = 𝛾 ′ (𝜃) ⋅ 𝜆(𝜃)
109 | P a g e
Example 1. Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from 𝑁(𝜇, 𝜎 2 ) Show that sample mean 𝑥˜ is
minimum variance bound estimator (MVBE) of 𝜇
Solution.
2
[𝛾′ (𝜃)]
[Use of 𝐶 − 𝑅 Inequality, Var (𝑡) ≥ ∂log 𝐿 2
𝐸( )
∂𝜃
∂𝜇
Here 𝛾(𝜃) = 𝜇 ∴ 𝛾 ′ (𝜃) = ∂𝜇 = 1 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), 𝐸(𝑋) = 𝜇, Var (𝑥) = 𝜎 2
We know that
𝑛
1
Var (𝑥‾) = Var ( ∑ 𝑥𝑖 )
𝑛
𝑖=1
𝑛 𝑛
1 1
= 2 ∑ Var (𝑥𝑖 ) = 2 ∑ Var (𝑋)
𝑛 𝑛
𝑖=1 𝑖=1
𝑛
1 2
1 2
𝜎2
= ∑ 𝜎 = ⋅ 𝑛𝜎 =
𝑛2 𝑛2 𝑛
𝑛=1
1 1 𝑥−𝜇 2
𝑓(𝑥, 𝜃) = 𝑒 −2 ( )
√2𝜋 𝜎
𝑛 𝑛
1 1
− 2
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = 𝑛 𝑒 2𝜎 ∑ (𝑥𝑖 − 𝜇)2
𝜎 (2𝜋)𝑛/2
𝑖=1 𝑖=1
𝑛
𝑛 1
log 𝐿 = −𝑛log 𝜎 − log (2𝜋) − 2 ∑ (𝑥𝑖 − 𝜇)2
𝐿 2𝜎
𝑖=1
𝑛
∂log 𝐿 1
=0−0− ⋅ 2 ∑ (𝑥𝑖 − 𝜇)(−1)
∂𝜇 2𝜎 2
𝑖=1
1 1
= 2 Σ(𝑥𝑖 − 𝜇) = 2 𝑛(𝑥‾ − 𝜇)
𝜎 𝜎
2 2
∂log 𝐿 𝑛
( ) = 𝑛 (𝑥‾ − 𝜇)2
∂𝜇 𝜎
2
∂log 𝐿 𝑛2
𝐸( ) = 4 𝐸(𝑥 − 𝜇)2
∂𝜇 𝜎
𝑛2 𝑛2 𝜎 2 𝑛
= 4 Var (𝑥) = 4 ⋅ = 2
𝜎 𝜎 𝑛 𝜎
(∵ 𝐸(𝑥‾) = 𝜇)
110 | P a g e
1 𝜎2
Thus by 𝐶 − 𝑅 Inequality Var (𝑥) ≥ 𝑛/𝜎2 i.e., Var (𝑥) ≥ 𝑛
Since this lower bound of variance of the estimate is equal to the variance of 𝑥‾ and Hence 𝑥‾
is MVB estimator of 𝜇.
𝑛
Example 2. For the distribution 𝑁(𝜇, 𝜎 2 = 𝜃), 0 < 𝜃 < ∞, find the efficiency of 𝑇 = 𝑛−1 𝑆 2
where 𝑆 2 is the variance of a random sample of size 𝑛 > 1.
Solution.
1 2 /2𝜃
𝑓(𝑥, 𝜃) = 𝑒 −(𝑥−𝜇) , −∞ < 𝑥 < ∞
√2𝜋𝜃
𝐸(𝑥) = 𝜇, Var (𝑋) = 𝜃 −∞ < 𝜇 < ∞, 𝜃 > 0
Let 𝑋1 , 𝑋2 , … . , 𝑋𝑛 be a random sample of size 𝑛 > 1. Then
𝑛 𝑛
1 1
𝑋‾ = ∑ 𝑋𝑖 , 𝑆 2 = ∑ (𝑋𝑖 − 𝑋‾)2
𝑛 𝑛
𝑖=1 𝑖=1
𝑛
2
𝑛𝑆 1
𝑇= = ∑ (𝑋𝑖 − 𝑋‾)2
𝑛−1 𝑛−1
𝑖=1
1
⇒ 𝑋‾ 2 − 𝑛 is an unbiased estimator of 𝛾(𝜇) = 𝜇 2 .
̅̅̅̅̅̅̅̅̅̅
1
Var (𝑋 2 − ) = Var 𝑋‾ 2 = 𝐸(𝑋‾ 4 ) − (𝐸𝑋‾ 2 )2
𝑛
2
3 6𝜇 2 1
=( 2+ + 𝜇4) − ( + 𝜇2)
𝑛 𝑛 𝑛
(∵ 𝜇4′ = 𝜇4 + 4𝜇3 𝜇1′ + 6𝜇2 (𝜇1′ )2 + (𝜇1′ )4
1 4 1
= = 3 ( ) + 0 + 6 ⋅ ⋅ 𝜇2 + 𝜇4)
√𝑛 𝑛
2 4𝜇 2 1 2
= 2+ + 𝜇4 − 2 − 𝜇2 − 𝜇4
𝑛 𝑛 𝑛 𝑛
𝑛
𝑀𝑉𝐵 4𝜇 2 /𝑛
(c) Efficiency of 𝑇 is given by 𝜂 = Var (𝑇) = 4𝜇2 2
<1
( + 2)
𝑛 𝑛
111 | P a g e
112 | P a g e
Solution. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample of size 𝑛 from 𝑓(𝑥, 𝜃). Then joint pdf of
(𝑋1 , 𝑋2 , … , 𝑋) is given by
𝑛 𝑛
1 𝑛 1
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = ( ) ∏
𝜋 1 + (𝑥𝑖 − 𝜃)2
𝑖=1 𝑖=1
𝑛
𝑇−𝑟(𝜃)
The RHS cannot be expressed in the form .
𝜆(𝜃)
Hence 𝑀𝑉𝐵 estimator does not exist for 𝜃 in Cauchy distribution. That is Cramer-Rao lower
bound is not attainable by the variance of any unbiased estimator of 𝜃.
1 1
𝑓(𝑥, 𝜃) = ⋅ , −∞ < 𝑥 < ∞
𝜋 1 + (𝑥 − 𝜃)2
log 𝑓(𝑥, 𝜃) = −log 𝜋 − log [1 + (𝑥 − 𝜃)2 ]
∂log 𝑓(𝑥, 𝜃) 2(𝑥 − 𝜃)(−1) 2(𝑥 − 𝜃)
=0− 2
=
∂𝜃 1 + (𝑥 − 𝜃) 1 + (𝑥 − 𝜃)2
Further ∂log 𝑓(𝑥, 𝜃) 2 4(𝑥 − 𝜃)2
[ ] =
∂𝜃 [1 + (𝑥 − 𝜃)2 ]2
∂log 𝑓(𝑥, 𝜃) 2 4(𝑥 − 𝜃)2
𝐸( ) =𝐸[ ]
∂𝜃 [1 + (𝑥 − 𝜃)2 ]2
∞
4(𝑥 − 𝜃)2 1 1
=∫ 2 2
⋅ ⋅ 𝑑𝑥
−∞ [1 + (𝑥 − 𝜃) ] 𝜋 1 + (𝑥 − 𝜃)2
113 | P a g e
𝑛
∂log 𝐿 𝑛 1
= − 2∑
∂𝜃 1+𝜃 𝑥𝑖 + 𝜃
𝑖=1
𝑛
2
∂ log 𝐿 𝑛 1
=− + 2 ∑
∂𝜃 2 (1 + 𝜃)2 (𝑥𝑖 + 𝜃)2
𝑖=1
𝑛
∂2 log 𝐿 𝑛 1
−𝐸 ( ) = − 2∑ 𝐸{ }
∂𝜃 2 (1 + 𝜃) 2 (𝑥𝑖 + 𝜃)2
𝑖=1
𝑛 1
= 2
− 2𝑛𝐸 { }
(1 + 𝜃) (𝑥 + 𝜃)2
𝑛 1 1+𝜃
= 2
− 2𝑛 ∫ 𝑑𝑥
(1 + 𝜃) (𝑥 + 𝜃) (𝑥 + 𝜃)2
2
∞
𝑛 1
= 2
− 2𝑛(1 + 𝜃) ∫ 4
𝑑𝑥
(1 + 𝜃) 1 (𝑥 + 𝜃)
∞
𝑛 1
= − 2𝑛(1 + ∂) [− ]
(1 + 𝜃)2 3(𝑥 + 𝜃)3 1
𝑛 2𝑛(1 + 𝜃) 1
= 2
− [−0 + ]
(1 + 𝜃) 3 (1 + 𝜃)3
𝑛 2𝑛 𝑛
= 2
− 2
=
(1 + 𝜃) 3(1 + 𝜃) 3(1 + 𝜃)2
𝑀𝑉𝐵 by C − R Inequality is given by
[𝜏 ′ (𝜃)]2
𝑀𝑉𝐵 =
∂2 log 𝐿
−𝐸 ( )
∂𝜃 2
1
= {∵ 𝜏(𝜃) = 𝜃}
𝑛
3(1 + 𝜃)2
3(1 + 𝜃)2
= .
𝑛
SELF ASSESEMENT (CONCEPTUAL QUESTIONS)
1. Comment on the following statements :
(i) In the case of Poisson distribution with parameter 𝜆, 𝑥‾ is sufficient for 𝜆.
(ii) If (𝑋1 , 𝑋2 , … 𝑋𝑛 ) be a sample of independent observation from the uniform
distribution on (𝜃, 𝜃 + 1), then the maximum likelihood estimator of 𝜃 is
unique.
114 | P a g e
(ii) If 𝑡 is the maximum likelihood estimator for 𝜃, state the condition under
which 𝑓(𝑡) will be the maximum likelihood estimator for 𝑓(𝜃).
(iii) Write down the condition for the Cramer-Rao lower bound for the variance of
the estimator to be attained.
115 | P a g e
(iv) Write down the general form of the distribution admitting sufficient statistic.
1
5. A random variable 𝑋 takes the values 1,2,3 and 4 , each with probability 4. A random
sample of three values of 𝑥 is taken, 𝑥‾ is the mean and 𝑚 is the median of this
sample. Show that both 𝑥‾ and 𝑚 are unbiased estimators of the mean of the
population, but 𝑥‾ is more efficient than 𝑚. Compare their efficiencies.
6. Give an example of estimates which are (i) Unbiased and efficient, (ii) Unbiased and
inefficient.
7. Mark the correct alternative:
(i) Let 𝑇𝑛 be an estimator, based on a sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 , of the parameter 𝜃.
Then 𝑇 is a consistent estimator of 𝜃 if
(a) P(𝑇𝑛 − 𝜃 > 𝜀) = 0∀𝜀 > 0,
(b) 𝑃(|𝑇𝑛 − 𝜃| < 𝜀) = 0,
(c) lim𝑛→∞ 𝑃(|𝑇𝑛 − 𝜃| > 𝜀) = 0∀𝜀 > 0,
(d) lim𝑛→− 𝑃(𝑇𝑛 − 𝜃 > 𝜀) = 0 ∀𝜀 > 0
(ii) Let 𝐸(𝑇1 ) = 𝜃 = 𝐸(𝑇2 ), where 𝑇1 and 𝑇2 are the linear functions of the
sample observations. If 𝑉(𝑇1 ) ≤ 𝑉(𝑇2 ) then:
(a) 𝑇1 is an unbiased linear estimator.
(b) Γ1 is the best linear unbiased estimator.
(c) 𝑇1 is a consistent linear unbiased estimator.
(d) 𝑇1 is a consistent best linear unbiased estimator.
(iii) Let 𝑋 be a random variable with 𝐸(𝑋) = 𝜇 and 𝑉(𝑋) = 𝜎 2 . Let 𝑋‾ be the
sample mean based on a random sample of size 𝑛, then 𝑋‾ is:
(a) the best linear unbiased estimator of 𝜇.
(b) an unbiased and consistent estimator of 𝜇.
(c) an unbiased and linear estimator of 𝜇.
(d) the best linear consistent estimator of 𝜇.
(iv) Let 𝜃 be an unknown parameter and 𝑇1 be an unbiased estimator of 𝜃. If
Var (𝑇1 ) ≤ Var (𝑇2 ), for 𝑇2 to be any other unbiased estimator, then 𝑇1 is
known as:
(a) minimum variance unbiased estimator.
(b) unbiased and efficient estimator.
(c) consistent and efficient estimator.
(d) unbiased, consistent and minimum variance estimator.
116 | P a g e
(i) The most important of all the methods of estimation is the method of
maximum likelihood. It generally yields good estimator as judged from
various criteria. Which one of the following statements about maximum
likelihood estimates is not, true?
(a) Maximum likelihood estimates are consistent.
(b) If maximum likelihood estimate exists, it is most efficient in the
class of estimates.
(c) Maximum likelihood estimates are sufficient.
(d) Maximum likelihood estimates are unbiased.
(vi) The maximum likelihood estimates, which are obtained by maximizing the
function of joint density of random variables, are generally :
(a) unbiased and inconsistent,
(b) unbiased and consistent,
(c) consistent and invariant, and
(d) invariant and unbiased.
1. Write True or False :
(i) The variance of the 𝑀𝑉𝑈𝐸 is always given by the Cramer'-Rao Bound. (False)
∂log 𝐿2 ∂log 𝑓(𝑥,𝜃) 2
(ii) equation 𝐸 ( ) = 𝑛𝐸 [ ] is not satisfied in case of 𝑓(𝑥, 𝜃) =
∂𝜃 ∂𝜃
1
, 0 ≤ 𝑥 ≤ 𝜃 (True)
𝜃
(iii) Cramer-Rao inequality for the variance of an estimator provides lower bound
to the variance (True)
117 | P a g e
(ii) If the variance of an estimator attains the Cramer-Rao lower bound, the
estimator is most.......... [Ans. efficient]
3. In each of the following questions four alternative answers are given in which only
one is correct. Select the correct answer and write (a), (b), (c) or (d) accordingly :
(i) If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample of size 𝑛 from the Poisson distribution
with mean 𝜆, the Cramer-Rao lower bound to the variance of any unbiased
estimator of 𝜆 is given by
𝑛
(a) 𝑒 −𝜆
𝜆
(b) 𝑛
√𝜆
(c) 𝑛
𝜆
−
(d) 𝑒 𝑛
Ans. (b)
∂log 𝑓(𝑥,𝜃) 2 1
(ii) The value of 𝐸 ( ) in case of 𝑓(𝑥, 𝜃) = 𝜃 , 0 ≤ 𝑥 ≤ 𝜃 is
∂𝜃
𝑛2
(a) 𝜃2
𝑛
(b)
𝜃2
𝑛2
(c) 𝜃
(d) none of these.
Ans. (b)
(iii) The necessary and sufficient conditions for Cramer-Rao lower bound to be
attainable is
∂ log 𝐿
(a) = 𝐴(𝜃)[𝑇 − 𝛾(𝜃)]
∂𝜃
∂2 log 𝐿
(b) = 𝐴(𝜃)[𝑇 − 𝛾(𝜃)]
∂2 𝜃
∂log 𝐿 𝑇
(c) = 𝐴(𝜃) [𝛾(𝜃)]
∂𝜃
∂2 log 𝐿 𝑇
(d) = 𝐴(𝜃) [ ]
∂𝜃2 𝛾(𝜃)
Ans. (a)
(iv) Suppose 𝐿(𝜃) is the likelihood function and 𝑡 is an unbiased estimator of 0 .
If
118 | P a g e
2
∂log 𝐿(𝜃) 2
𝑉1 = 𝐸(𝑡 − 𝜃) , 𝑉2 = 𝐸 [ ] ,
∂𝜃
which one of the following is true?
(a) 𝑉1 ≤ 𝑉2
(b) 𝑉1 ≥ 𝑉2
1
(c) 𝑉1 = 𝑉
2
1
(d) 𝑉1 ≥ 𝑉 .
2
Ans. (d)
119 | P a g e
Question 4.
0, if 0≤𝑥 < 1
2
Let 𝑓: [0,3] → ℝ be define by 𝑓(𝑥) = {𝑒 𝑥 − 𝑒, if 1≤𝑥 < 2
2
𝑒 𝑥 + 1, if 2≤𝑥 ≤ 3
𝑥
Now, define 𝑓: [0,3] → ℝ by 𝐹(0) = 0 and 𝐹(𝑥) = ∫0 𝑓(𝑡)𝑑𝑡, for 0 < 𝑥 ≤ 3
Then
A. 𝐹 is differentiable at 𝑥 = 1 and 𝐹 ′ (1) = 0
B. 𝐹 is differentiable at 𝑥 = 1 and 𝐹 ′ (2) = 0
C. F is not differentiable at 𝑥 = 1
D. 𝐹 is differentiable at 𝑥 = 1 and 𝐹 ′ (2) = 1
Question 5.
sin (2(𝑥 2 +𝑦 2 )) 4
3𝑥sin ( )
2
Let 𝑓/∈ ℝ → ℝ be define by 𝑓(𝑥, 𝑦) = { 𝑒 𝑦 , if (𝑥, 𝑦) ≠ (0,0)
𝑥 2 +𝑦 2
𝛼, if (𝑥, 𝑦) = (0.0)
Where 𝑎 is a real constant? If 𝑓 is continuous at (0,0), then a is equal to
A. 1
B. 2
C. 3
D. 4
Question 6
Which probability distribution is used to model the time elapsed between events?
A. Exponentia
B. Poisson
C. Normal
D. Gamma
Question 7.
Let 𝑓: ℝ × ℝ → ℝ be define by
𝑓(𝑥, 𝑦) = 𝑥 2 + 𝑥𝑦 + 𝑦 2 − 𝑥 − 100
121 | P a g e
122 | P a g e
1
A. ∑𝑛𝑖=1 𝑋𝑖
𝑛
5 1/2
B. (𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
1
C. ∑𝑛𝑖=1 𝑋𝑖2
5𝑛
1 1/2
D. (5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
Question 12.
Consider the trinomial distribution with the probability
7!
mass function 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑥!𝑦!(7−𝑥−𝑦)! (0.6)𝑥 (0.2)y (0.2)7−𝑥−𝑦 , 𝑥 ≥ 0, 𝑦 ≥ 0,
and 𝑥 + 𝑦 ≤ 7
Then 𝐸(𝑌 ∣ 𝑋 = 3) is equal to ⋯
A) 2
B) 3
C) 4
D) 5
Question 13.
Let 𝑋1 , … , 𝑋𝑛 be a random sample of size 𝑛(≥ 2) from a uniform distribution with
probability density function
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwise
where 𝜃 ∈ (0, ∞). If 𝑋(1) = min{𝑋1 , … , 𝑋𝑛 } and
𝑋(𝑛) = max{𝑋1 , … , 𝑋𝑛 }
1 𝜃
then the conditional expectation 𝐸 [𝜃 (𝑋(𝑛) + 𝑛+1) ∣ 𝑋1 − 𝑋2 = 5] = ⋯ …
A) 1
B) 2
C) 3
D) 34
Question 14.
If a constant value 100 is subtracted from each observation of a set, the mean of the set is:
123 | P a g e
A. increased by 50
B. decreased by 100
C. is not affected
D. zero
Question 15.
Extreme values in a distribution have no effect on:
A. average
B. median
C. geometric mean
D. harmonic mean
Question 16.
Line of regression of Y on X and X on Y can’t be
A. Parallel
B. Perpendicular
C. Coincide
D. None of these
Question 17.
Sum of square of deviation is minimum, when the deviation is taken from
A. Mean
B. Median
C. Mode
D. None of these
Question 18.
All value in a sample are same, then their mean is
A. Zero
B. One
C. Sum of Variance
D. None of these
124 | P a g e
Question 19.
Variance of a Random variable is always
A. Positive
B. Non-negative
C. Zero
D. Can’t Say
Question 20.
X ∼ 𝑃(1.412) then mean of random variable X
A. 1.412
B. 1.512
C. 1.96
D. None of these
Question 21.
1
Moment generating function of x where 𝑥 ∼ Bin (2, 2) is
2
1+𝑒 𝑡
A. ( )
2
2
(1+𝑒 2 )
B. 2
C. (1 − 𝑒 2 𝑡)2
D. (1 + 𝑒 𝑡 )
Question 22.
1
𝑥 ∼ Bin (2, 3)
find 𝐸(𝑒 2𝑥 ) = ⋯
2
2+𝑒 2
A. ( )
3
2
2−𝑒 2
B. ( )
3
2−𝑒
C. ( )
3
125 | P a g e
2+𝑒 2
D. ( )
2
Question 23.
X~𝑃(𝜆)
Then 𝐸(X 2 ) will 𝑏𝑒
A. λ(λ + 1)
B. 𝜆(𝜆 − 1)
C. (𝜆 − 1)
D. (𝜆 + 1)
Question 24.
In Poisson Distribution mean is……variance
A. Equal to
B. Greater than
C. Less than
D. None of these
Question 25.
If 𝑎: 𝑏: 𝑐 = 3: 4: 7, then the ratio (𝑎 + 𝑏 + 𝑐): 𝑐 is equal to
A. 2:1
B. 14:3
C. 7:2
D. 1:2
Question 26.
Quartile deviation or semi inter-quartile deviation is given by the formula:
𝑄3 +𝑄1
A. Q.D. = 2
B. Q.D. = 𝑄3 − 𝑄1
C. Q.D. = (𝑄3 − 𝑄1 )/2
D. Q.D.= (𝑄3 − 𝑄1 )/4
126 | P a g e
Question: 27
The moment generating function of a random variable X is
given by
1 1 𝑡 1 2𝑡 1 3𝑡
𝑀𝑋 (𝑡) = + 𝑒 + 𝑒 + 𝑒 , −∞ < 𝑡 < ∞
6 3 3 6
Then P(X ≤ 2) equals
1
A.
3
1
B. 6
1
C. 2
5
D. 6
Question: 28
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from 𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }. Define
1 1 1
𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = 4 (3𝑋(1) + 𝑋(𝑛) + 1)and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2) an estimator
for 𝜃, then which of the following is/are TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
127 | P a g e
1
A. k = 𝜋2
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
C. P(X = Y) = 0
D. all of the above
Question: 30
Lę 𝑋1 , 𝑋2 , … , 𝑋𝑛
be sequence of independently and identically distributed random variables with the
probability density function
1 2 −𝑥
𝑓(𝑥) = {2 𝑥 𝑒 , if 𝑥 > 0 and let
0, otherwise
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
Then which of the following statement is/are TRUE?
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as
n→∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question: 31
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
128 | P a g e
5.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
5.6 GLOSSARY
Motivation: These Problems are very useful in real life and we can use it in data science ,
economics as well as social sciemce.
Attention: Think how the Cramer Rao Inequality is useful in real world problems.
5.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1 : C
Explanation :
∞ 𝑥1 𝑒 −𝑥1 𝑥2 1
The marginal pdf of 𝑋1 is 𝑔(𝑥1 ) = ∫0 𝑑𝑥2 = 2 , 1 < 𝑥1 < 3
2
𝑓(𝑥1 , 𝑥2 )
ℎ(𝑥2 ∣ 𝑥1 ) = = 𝑥1 𝑒 −𝑥1𝑥2 , 𝑥2 > 0
𝑔(𝑥1 )
1
𝑋2 ∣ 𝑋1 ∼ Exp (𝑥1 ) with mean 𝑥
1
1
Therefore Var (𝑋2 ∣ 𝑋1 = 2) = 4 = 0.25
Hence 0.25 is correct answer.
Answer 2 : A
Explanation :
𝑇 𝑇(𝑇−1)
+ 𝑛(𝑛−1) is UMVUE of 𝜃(1 + 𝜃)
𝑛
𝑇 𝑇(𝑇−1)
𝑬 (𝑛 + 𝑛(𝑛−1)) = 𝜃(1 + 𝜃); where
𝑇 = ∑𝑛𝑖=1 𝑋𝑖
𝑇 𝑇(𝑇−1) 3 3(3−1) 21
Therefore, 𝑛 + 𝑛(𝑛−1) = 6 + 6(6−1) = 30 = 0.70
129 | P a g e
𝑛(𝜃−𝑥)
𝑓𝑋(1) (𝑥) = {𝑛𝑒 , 𝑥>𝜃
0, otherwise
2 2
𝑃 (𝑋(1) − log 𝑒 5 ≤ 𝜃 ≤ 𝑋(1) ) = 𝑃 (𝜃 ≤ 𝑋(1) ≤ 𝜃 + log 𝑒 5) = 0.96
𝑛 𝑛
Hence 0.96 is the correct answer.
Answer 4 : A
Explanation :
f is discontinuous at x = 2. So we consider the interval [0,1.5].
Clearly, f is continuous on [0,1.5]. By the fundamental theorem of calculus,
we have that F is differentiable on [0,1.5], and so is at 𝑥 = 1
with 𝐹 ′ (1) = 𝑓(1) = 𝑒 − 𝑒 = 0
Hence option A is the correct choice.
Answer 5 : B
Explanation :
Given that 𝑓 is continuous at (0,0). Then, lim(𝑥,𝑦)→(0,0) 𝑓(𝑥, 𝑦) = 𝑓(0,0) = 𝛼
Putting 𝑥 = 0 and taking 𝑦 → 0, we get
sin (2𝑦 2 ) sin (2𝑦 2 )
𝛼 = lim𝑦→0 𝑓(0, 𝑦) = lim𝑦→0 = 2lim𝑦→0 =2
𝑦2 2𝑦 2
Hence option B is the correct choice.
Answer 6 : A
Explanation :
Exponential distribution is used to find the elapsed between events.
Answer 7 : A
Explanation :
The first order partial derivatives of 𝑓 are given by
∂𝑓 ∂𝑓
= 2𝑥 + 𝑦 − 1 and ∂𝑦 = 𝑥 + 2𝑦
∂𝑥
For critical points, we equates first order partial derivatives to zero, and on solving the
2 1
resulting equations, we get 𝑥 = 3 and 𝑦 = − 3
2 1
The second order partial derivatives of 𝑓 evaluated at (3 , − 3) are given by
130 | P a g e
∂2 𝑓 ∂2 𝑓 ∂2 𝑓
𝑟 = [∂𝑥 2 ] 2 1 = 2, 𝑠 = [∂𝑥 ∂𝑦] 2 1
=1 and [∂𝑦 2] 2 1
=2
( ′, ) ( ′− ) ( ′, )
3 3 3 3 3 3
2 1
Since 𝑟𝑡 − 𝑠 2 = 3 > 0 and 𝑟 > 0, we conclude that 𝑓 has local minimum at (3 , − 3).
Since there is only one critical point, we do not have any point of local maximum.
Hence option A is correct.
Answer 8 : A
Explanation :
Correlation is the linear relation between two random variable X and Y.
cov (X,Y)
Correlation Coeff r= 𝜎𝑥 ,𝜎𝑦
Answer 9 : C
Explanation :
1 1 2
Let 𝐴 = [2 3 −1]
4 7 𝑐
The condition for no solution is that rank (A: B) ≠ rank (A). This would we satisfied if c +
7 = 0, which implies that c = −7.
Hence option C is correct.
Answer 10 : C
Explanation :
Since sample multiple correlation lies between 0 to 1 0 ≤ 𝑟1.23,…,𝑛 ≤ 1
So option C only hold this condition
Hence option C is correct.
Answer 11 : D
Explanation :
We have
2
𝐸(𝑋𝑖2 ) = 𝑉(𝑋𝑖 ) + (𝐸(𝑋𝑖 )) = 5𝜃 2 ; 𝑖 = 1,2, … , 𝑛
𝑥12 𝑥22
Then , ,…
5 5
𝑥2
is a sequence of i.i.d random variables with 𝐸 ( 51 ) = 𝜃 2 . Using WLLN,
131 | P a g e
1 1
we get 𝑛 ∑𝑛𝑖=1 𝑋𝑖2 /5 = 5𝑛 ∑𝑛𝑖=1 𝑋𝑖2
1 1/2
converge in probability to 𝜃 2 as 𝑛 → ∞, which implies that (5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
converge in probability to 𝜃 as 𝑛 → ∞.
1 1/2
Thus (5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 ) is a consistent estimator for 𝜃.
Hence option D is the correct choice.
Answer 12 : A
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for x, y = 0,1,2, … , n and x + y ≤ 𝑛
𝑞
where p+q≤1. Then 𝑌 ∣ 𝑋 = 𝑥 ∼ 𝐵 {𝑛 − 𝑥, 1−𝑝} 𝑛 = 7, 𝑝 = 0.6 and 𝑞 = 0.2
𝑞 0.2 0.2
Y ∣ X = x ∼ B {𝑛 − 𝑥, } ⇒ 𝐸(𝑌 ∣ 𝑋 = 3) = (7 − 3) × = 4× =2
1−𝑝 1 − 0.6 0.4
Hence A is the correct answer.
Answer 13 : A
Explanation :
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwise
𝑛𝜃
E(𝑋(𝑛) ) =
𝑛+1
1 1 1 𝜃
𝐸 [ (𝑋(𝑛) + ) ∣ 𝑋1 − 𝑋2 = 5] = 𝐸 [ (𝑋(𝑛) + )] = 1
𝜃 𝑛+1 𝜃 𝑛+1
Hence correct answer is A.
Answer 14 : B
Explanation :
∑𝑓 𝑥
Mean X‾ = 𝑁𝑖 𝑖
if (𝑥𝑖 )11 = 𝑥𝑖 − 100
132 | P a g e
133 | P a g e
1 1 𝑡 2
𝑀(𝑡) = ( + 𝑒 )
2 2
𝑡 2
1+𝑒
=( )
2
Answer 22 : A
Explanation -
1
𝑥 ∼ Bin (2, )
3
𝑡𝑥 )
2 1 𝑡 2
𝐸(𝑒 =( + 𝑒 )
3 3
2 1 2
2𝑥 )
So 𝐸(𝑒 = ( + 𝑒 2)
3 3
Answer 23 : A
Explanation -
X~𝑃(𝜆)
𝐸(𝑥) = λ
Then [ ]
𝑉(𝑥) = λ
134 | P a g e
Answer 24 : A
Explanation -
𝑥 ∼ 𝑃(𝜆)
Then 𝐸(𝑥) =𝜆
𝑉(𝑥) =λ
So mean = variance
Answer 25 : A
Explanation -
𝑎: 𝑏: 𝑐
3: 4: 7
3𝑥: 4𝑥: 7𝑥 ⇒ 14𝑥
∴ 𝑎 + 𝑏 + 𝑐 = 14𝑥
𝑐 = 7𝑥
∴ (𝑎 + 𝑏 + 𝑐): 𝑐
= 14𝑥: 7𝑥
= 2: 1
Answer 26 : C
Explanation :
Semi inter-quartile deviation = Q.D. = (𝑄3 − 𝑄1 )/2
Answer 27 : D
Explanation:
Let 𝑋 be Random Variable with 𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑etx P(X = x)
1
; 𝑥=0
6
1
; 𝑥=1
3
Then 𝑃(𝑋 = 𝑥) = 1
; 𝑥=2
3
1
{6 ; 𝑥 = 3
135 | P a g e
Answer 28 : A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
1 1
distribution of 𝑋 free from parameter, then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 <
1
1 1 3
Take 𝜆 = , and then we obtained mle of 𝜃 are
2 4 4
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
Answer 30 : D
Explanation:
Clearly, 𝑋1 , 𝑋2 , … , 𝑋𝑛
are i.i.d 𝐺(3,1) random variables. Then, 𝐸(𝑋𝑖 ) = 3 and Var (𝑋𝑖 ) = 3, 𝑖 = 1,2, …
Let 𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then E(𝑆𝑛 ) = 3𝑛 and Var (𝑆𝑛 ) = 3𝑛
136 | P a g e
𝑆
For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
For option (c)
𝑆𝑛
→3
𝑛
with probability 1 (By using convergent in probability condition)
(B) When there are more than two variables include, the observation lead to multinomial
distribution.
(𝑋, 𝑌) not follows Multinomial (2𝑛; 𝑝, 𝑝)
137 | P a g e
138 | P a g e
LESSON 6
INTERVAL ESTIMATION
STRUCTURE
6.1 Learning Objectives
6.2 Introduction
6.3 Interval Estimation
6.3.1 One Sided and Two Sided Confidence interval
6.3.2 Cental and Non Central Confidence Interval
6.3.3 Pivotal Method to find the Confidence interval
6.4 In-Text Questions
6.5 Summary
6.6 Glossary
6.7 Answer to In-Text Questions
6.8 References
6.9 Suggested Readings
6.1 LEARNING OBJECTIVES
In statistics, the evaluation of a parameter for example, the mean (average) of a population by
computing an interval, or range of values, within which the parameter is most likely to be
located. Intervals are commonly chosen such that the parameter falls within with a 95 or 99
percent probability, called the confidence coefficient. Hence, the intervals are
called confidence intervals; the end points of such an interval are called upper and lower
confidence limits.
The interval containing a population parameter is established by calculating that statistic from
values measured on a random sample taken from the population and by applying the
knowledge (derived from probability theory) of the fidelity with which the properties of a
sample represent those of the entire population.
The probability tells what percentage of the time the assignment of the interval will be correct
but not what the chances are that it is true for any given sample. Of the intervals computed
from many samples, a certain percentage will contain the true value of the parameter being
sought.
139 | P a g e
In newspaper stories during election years, confidence intervals are expressed as proportions
or percentages. For instance, a survey for a specific presidential contender may indicate that
they are within three percentage points of 40% of the vote (if the sample is large enough). The
pollsters would be 95% certain that the actual percentage of voters who supported the candidate
would be between 37% and 43% because election polls are frequently computed with a 95%
of the confidence level.
Stock market investors are most interested in knowing the actual percentage of equities that
rise and fall each week. The percentage of American households with personal computers is
relevant to companies selling computers. Confidence intervals may be established for the
weekly percentage change in stock prices and the percentage of American homes with personal
computers.
In data analysis, calculating the confidence interval is a typical step that may be easily derived
from populations with normally distributed data using the well-known x (ts)/n formula. The
confidence interval, however, is not always easy to determine when working with data that is
not regularly distributed. There are fewer and far less easily available references for this data
in the literature.
6.2 INTRODUCTION
Let 𝑇 be an estimator whose value 𝑡 is a point estimate of some unknown parameter 𝜃.
Even if the estimator 𝑇 satisfies the desirable properties of point estimators, it is clear that 𝑡
ordinarily will not be equal to the value 𝜃 because of sampling error. Thus, it becomes
necessary to indicate the general magnitude of this error. This allows us to estimate the
unknown parameter 𝜃 from the observed sample values, as follows.
𝜃 = 𝑡 ± error
Then our problem is to know, "What is the magnitude of this error?" and "How sure are we
that we are right?"
If sampling error is small the true value is likely to be covered by a small, estimated range of
values. On the other hand, if sampling error is large, the true value is likely to be covered by a
large estimated range of values. The answer to this problem is interval estimation based on
confidence intervals.
Method for Confidence Interval :
The method of interval estimation using the concept of confidence intervals can better be
explained as given below :
Let the random variable 𝑋 have a probability (mass/density) function 𝑓(𝑥, 𝜃) with the
parameter 𝜃 which we wish to estimate by means of a random sample of observations
𝑥1 , 𝑥2 , … , 𝑥𝑛 .
140 | P a g e
141 | P a g e
Then 𝑇1 is called one sided lower confidence limit for 𝜏(𝜃) and the interval (𝑇1 , ∞) is known
as one sided confidence interval on the right hand tail
Fig. 6.1 with confidence coefficients (1 − 𝛼1 ) as shown below [for 𝑋 ∼ 𝑁(𝜇, 𝜎 2 )] :
Similarly 𝑇2 = 𝑡2 (𝑥1 , 𝑥2 , … , , 𝑥𝑛 ) be another statistic for which
𝑃[𝜏(𝜃) < 𝑇2 ] = 1 − 𝛼2
or 𝑃[−∞ < 𝜏(𝜃) ≤ 𝑇2 ] = 1 − 𝛼2 ,
then 𝑇2 is called one sided upper confidence limit for 𝜆(𝜃) and the interval [−∞, 𝑇2 ] is known
as one sided confidence interval on the left hand side tail with confidence coefficient 1 − 𝛼2 ,
as shown below [for 𝑋− 𝑁(𝜇, 𝜎 2 )] :
In brief, when we delete the area on either left side or right side only, then the confidence
interval is said to be one sided.
Combining the two one sided confidence intervals we get two sided confidence interval (or
simply confidence interval) where 𝛼 = 𝛼1 + 𝛼2
Here [𝑇1 , 𝑇2 ] is 100(1 − 𝛼) percent confidence interval for 𝜏(𝜃) as shown
below (for 𝑋∼ 𝑁(𝜇, 𝜎 2 )] :
6.3.2: Central and Noncentral Confidence Intervals :
The interval 𝑇1 to 𝑇2 (in case of two sided confidence interval) can be obtained in various ways
:
142 | P a g e
We can construct different confidence intervals with the same confidence coefficient by
deleting areas equal to 𝛼2 at the right end and 𝛼1 at the left end of the curve such that 𝛼1 +
𝛼2 = 𝛼, say and 𝛿 = 1 − 𝛼.
𝛼
When we take 𝛼1 = 𝛼2 = then the confidence interval is often called central confidence
2
Interval.
When 𝛼1 ≠ 𝛼2 , then (𝑇1 , 𝑇2 ) is Non-central Confidence Interval.
The length of the confidence Interval
The difference 𝑇2 − 𝑇1 is known as the length of the confidence interval.
𝑃[𝑇1 ≤ (𝜃) ≤ 𝑇2 ] = 1 − 𝛼
We prefer that interval for which the length is shortest. The confidence interval with shortest
length is known as Shortest Confidence Interval.
6.3.3 Pivotal Method to find the Confidence interval:
Pivotal Quantity:
Let 𝑥1 , 𝑥2 , … . . , 𝑥𝑛 , be a random sample from pmf/pdf 𝑓(𝑥, 𝜃). Let 𝑄 = 𝑞(𝑥1 , 𝑥2 , … , 𝑥𝑛 , 𝜃) be
a function of 𝑥1 , 𝑥2 , … , 𝑥𝑛 and 𝜃. If the distribution of 𝑄 does not depend on 𝜃, then 𝑄 is
called a pivot or Pivotal Quantity.
For example,
Let 𝑋 ∼ 𝑁(𝜃, 𝜎 2 = 9). Then
9
(i) 𝑄 = 𝑥‾ − 𝜃 is a pivotal quantity, since the distribution of 𝑄 = 𝑥‾ − 𝜃 − 𝑁 (0, 𝑛)
independent of 𝜃
𝑥‾−𝜃 𝑥‾−𝜃
(ii) 𝑄 = 3/ 𝑛 is a pivotal quantity, since the distribution of 𝑄 = 3/ 𝑥 − 𝑁(0.1), independent
√ √
of 𝜃.
𝑥‾ 𝑥‾ 9
(iii) 𝑄 = 𝜃 is not a pivotal quantity because distribution of 𝑄 = 𝜃 − 𝑁 (1, 𝑛𝜃2 ), depends on 𝜃.
Pivotal Method
Let a random variable which is a function of 𝑥1 , 𝑥2 … … , 𝑥𝑛 and 𝜃 denoted by 𝑄 or 𝜓(𝑇, 𝜃) and
whose distribution is independent of 𝜃 be taken as a pivot ; For each 𝜃, 𝜓(𝑇, 𝜃) is a statistic
and 𝑇 is a point estimate of 𝜃. For any fixed 0 < 𝛿 < 1, there will exist 𝑞1 and 𝑞2 depending
on 𝛿 such that.
𝑃[𝑞1 < 𝑄 < 𝑞2 ] = 𝛿
and for each possible sample values 𝑥1 , 𝑥2 , … . , 𝑥𝑛
143 | P a g e
for functions 𝑡1 and 𝑡2 not depending on 𝜃, then [𝑡1 , 𝑡2 ] is a 100 𝛿% confidence interval of
𝜏(𝜃) (or in particular 𝜃 ).
When the lower and upper confidence limits in a 100𝛿% confidence interval for
𝜃, 𝑃(𝑡1 ≤ 𝜃 ≤ 𝑡2 ) = 𝛿 depend on the point estimate 𝑇 for 𝜃 and also on the sampling
distribution of the pivotal quantity (or the statistic 𝑇 itself), then the method of obtaining the
confidence interval is often called a General Method or Pivotal Method.
Example 1. Obtain an expression for confidence Interval 100(1 − 𝛼)% for the Mean where
variance is known in case of 𝑁(𝜇, 𝜎 2 ) .
Solution. Suppose a random sample 𝑥1 , 𝑥2 , … . , 𝑥𝑛 is drawn from a normal popuiation with
mean 𝜇 and variance 𝜎 2 . Since the most efficient point estimator for the population mean 𝜇 is
the sample mean 𝑋‾, we can establish a confidence interval for 𝜇 by considering the sampling
distribution of 𝑋‾.
𝜎 2
We know that if 𝑋∼ 𝑁(𝜇, 𝜎 2 ), then 𝑋‾∼ 𝑁 (𝜇, 𝑛 ) and the corresponding standard normal
variate (i.e., pivotal quantity).
𝑋‾ − 𝜇
𝑍 = − 𝑁(0,1)
𝜎/√𝑛
1 −122
i.e. 𝑓(𝑧) = 𝑒 2 , −∞ < 𝑧 < ∞
√2𝜋
Let 𝑍𝛼/2 be the value of 𝑍 such that.
1 1 2 𝛼
𝑃(𝑍 ≥ 𝑍𝛼/2 ) = ∫ 𝑓(𝑧)𝑑𝑧 = ∫ 𝑒 −2𝑒 𝑑𝑧 = .
𝑍𝛼/2 𝑍𝛼/2 √2𝜋 2
𝛼
and 𝑍1 − 2 = −𝑍𝛼/2 be the value of 𝑍 such that.
−𝑍𝛼/2 −𝑍𝛼/2
1 1 2 𝛼
𝑃(𝑍 ≤ −𝑍𝛼/2 ) = ∫ 𝑓(𝑧)𝑑𝑧 = ∫ 𝑒 −2𝑡 𝑑𝑥 =
−∞ −∞ √2𝜋 2
Then, clearly we have (See figure 6.4) 𝑃(−𝑍𝛼/2 ≤ 𝑍 ≤ 𝛼/2) = 1 − 𝛼 = 8, say ⇒
𝑥‾−𝜇 𝜎 𝜎
𝑃 (−𝑍𝛼/2 ≤ ≤ 𝑍𝛼/2 ) = 1 − 𝛼 𝑃 (−𝑍𝛼/2 ≤ 𝑋‾ − 𝜇 ≤ 𝑍𝛼/2 ) = 1 − 𝛼
𝜎/√𝑛 √𝑛 √𝑛
𝜎
[ Multiplying each term in the inequality by ]
√𝑛
144 | P a g e
or
𝜎 𝜎
𝑃 (𝑋‾ − 𝑍𝛼/2 ≤ 𝜇 ≤ 𝑋‾ + 𝑍𝛼/2 )= 1−𝛼
√𝑛 √𝑛
[Subtracting 𝑋‾ from each term in the inequality and multiplying by - 1]
Thus, the (1 − 𝛼)100% confidence interval for 𝜇 in normal populatioa when 𝜎 2 is known
𝜎 𝜎
𝑋‾ − 𝑍𝛼/2 ≤ 𝜇 ≤ 𝑋‾ + 𝑍𝛼/2
√𝑛 √𝑛
where 𝑋‾ is the mean of a random sample of size 𝑛 from a normal population with mean 𝜇 and
known variance 𝜎 2 and 𝑍𝛼/2 is the value of the standard normal variate having an area of 𝛼/2
to the right side.
Example 2. Obtain 95% confidence interval for the mean of a normal distribution 𝑁(𝜇, 𝜎 2 )
where the variance 𝜎 2 is known. What is length of this confidence interval?
145 | P a g e
146 | P a g e
𝜎 𝜎
⇒ 𝑃 [−1.88 ≤ 𝑥‾ − 𝜇 ≤ 2.05 ] = 0.95
√𝑛 √𝑥
−1.88 2.05 ∵𝜎=1
⇒ [ ≤ 𝑥‾ − 𝜇 ≤ ] = 0.95 { 𝑛 = 16
4 4
√𝑛 = 4
⇒ 𝑃[−.47 < 𝑥‾ − 𝜇 < .5125] = 0.95
⇒ 𝑃[−𝑥‾ − 0.47 < −𝜇 < −𝑥‾ + .5125] = 0.95
⇒ 𝑃[𝑥‾ − .5125 < 𝜇 < 𝑥‾ + 0.47] = 0.95
Example 4. Obtain an expression for confidence interval for the mean when variance is not
known in case of 𝑁(𝜇, 𝜎 2 ).
Solution. Case 1. Sample size 𝑛 > 30, variance is unknown in this case we estimate variance
𝜎 2 by
1
𝑆2 = Σ(𝑥𝑖 − 𝑥‾)2 , the
𝑛
𝑥
−𝜇
𝑍 =𝑥 ∼ 𝑁(0,1)
𝑆/√𝑛
and the (1 − 𝛼)100% confidence interval for 𝜇 is given by
𝑥‾ − 𝜇
𝑃 [−𝑍𝛼/2 ≤ ≤ 𝑍𝛼/2 } =1−𝛼
𝑆/√𝑥
𝑆 𝑆
⇒ 𝑃 [𝑥‾ − 𝑍𝛼/2 ≤ 𝜇 ≤ 𝑥‾ + 𝑍𝛼/2 ] = 1 − 𝛼.
√𝑥 √𝑛
Case 2. Sample size is small, 𝑛 ≤ 30, variance is unknown.
In this case we estimate variance 𝜎 2 by
1
𝑠 2 = 𝑛−1 Σ(𝑥𝑖 − 𝑥‾)2 , then statistic (i.e. pivotal quantity)
𝑥‾ − 𝜇
𝑡= ∼ Students' 𝑡 with (𝑛 − 1)𝑑𝑓
𝑠/√𝑥
and the (1 − 𝛼)100% confidence interval for 𝑡 is given by
147 | P a g e
𝑡𝛼/2
𝑃[−𝑡𝛼/2 ≤ 𝑡 ≤ 𝑡𝛼/2 ] = ∫ 𝑓𝑇 (𝑡)𝑑𝑡 = 1 − 𝛼
−𝑡𝛼/2
𝑥‾ − 𝜇
⇒ 𝑃 [−𝑡𝛼/2 ≤ ≤ 𝑡𝛼/2 ] = 1 − 𝛼
𝑠/√𝑛
𝑠 𝑠
⇒ 𝑃 [𝑥‾ − 𝑡𝛼/2 ⋅ ≤ 𝜇 ≤ 𝑥‾ + 𝑡𝛼/2 ⋅ } = 1 − 𝛼
√𝑛 √𝑛
𝑠 𝑠
Hence the (1 − 𝛼)100% confidence interval for 𝜇 is (𝑥‾ − 𝑡𝛼/2 , 𝑥‾ + 𝑡𝛼/2 )
√𝑛 √𝑛
Example 5. A sample of 10 individuals has a mean of 53; and the sum of the squares of
deviations from the mean 81. Find 90% confidence interval for the mean 𝜇, assuming that the
population is normal with variance unknown.
𝑠
Solution. The (1 − 𝛼)100% confidence interval for 𝜇 is given by 𝑃 {𝑥‾ − 𝑡𝛼/2 ≤ 𝜇 ≤ 𝑥‾ +
√𝑥
𝑠
𝑡𝛼/2 }=1−𝛼
√𝑛
𝛼
Here 1 − 𝛼 = 90, 𝛼 =⋅ 10, 2 =⋅ 5, 𝑛 = 10
𝑥‾ = 53, Σ(𝑥𝑖 − 𝑥‾)2 = 81
𝑛
2
1 81
𝑠 = ∑ (𝑥𝑖 − 𝑥‾)2 = =9
𝑛−1 9
𝑖=1
𝑠 =3
The value of 𝑡𝛼/2 (i.e., 𝑡05 ) for 9 d.f. is 1.833 (using the table of 𝑡 values)
3 3
53 − 1.833 × ≤ 𝜇 ≤ 53 + 1.833 ×
Hence the 90% confidence interval for 𝜇 is √10 √10
51.26 ≤ 𝜇 ≤ 54.74, taking √10 = 3.162
Example 6. Construct a 90% confidence interval for 𝜎 2 on the basis of random sample of size
1
10 with a standard deviation 𝑠 = √𝑛−1 Σ(𝑥𝑖 − 𝑥‾)2 = 3.2 in case of 𝑁(𝜇, 𝜎 2 ), 𝜇 being
unknown.
Solution. The (1 − 𝛼)100% confidence interval for 𝜎 2 based on 𝜒 2 -statistic is given by
148 | P a g e
(𝑛 − 1)𝑠 2 2
(𝑛 − 1)𝑠 2
𝑃( 2 ≤𝜎 ≤ )=1−𝛼
𝜒𝛼/2 𝜒2 𝛼
1−
2
2 2
Here 𝑛 = 10, 𝑠 = (3 ⋅ 2) = 10.24
𝛼
1 − 𝛼 = 99, 𝛼 =⋅ 10. = .05
2
(𝑛 − 1)𝑠 2 = (10 − 1) × 10.24 = 9 × 10.24 = 92.16
2
𝜒0.05 (9) = 16.912
2 } from Table values of 𝜒 2 .
𝜒0.95 (9) = 3.325
(iv) If a random sample of size 𝑛 is drawn from 𝑁(𝜇, 𝜎 2 ) with unknown 𝜇 and 𝜎 2 ,
then 100(1 − 𝛼)% confidence interval for 𝜎 2 will be
𝑛𝑠 2 2
𝑛𝑠 2
2 𝛼 ≤𝜎 ≤ 2 𝛼
𝜒𝑛−1 (2) 𝜒𝑛−1 (1 − 2 )
(True)
149 | P a g e
𝑥‾
[ Ans. 𝑥‾ ± 1.96√ ]
𝑛
(a) 0.01
(b) 0.001
(c) 0.99
(d) none of these
[Ans. (c)]
(ii) Assume that you take a random sample from 𝑁(𝜇, 𝜎 2 ) and calculate 𝑥‾ as 100 .
You then calculate the upper limit of a 90 percent confidence interval for 𝜇; its
value is 112 . The lower limit for the confidence interval will be :
(a) 88
(b) 92
(c) 100
(d) 124
150 | P a g e
[Ans, (a)]
(iii) A sample of 400 units has a mean 4.6 and standard deviation 3-42. If the
population is normal with unknown mean 𝜇 and 𝜎, the 99.73% confidence
interval for 𝜇 is
(a) 4 ⋅ 6 ∓ 0.413
(b) 4.6 + 1.71
(c) 2.3 ∓ 0.413
(d) 2.3 ∓ 1.71
[Ans. (a)]
(iv) In a town a sample of 100 voters contained 64 persons who favoured a particular
issue. We can be 95% confident that proportion of voters in the community
who favour the issue is contained between the limts of :
(a) 0.50 and 0.70
(b) 0.50 and 0.74
(c) 0.55 and 0.73
(d) 0.55 and 0.70
{Ans. (c)}
(v) A fair coin is tossed repeatedly. If 𝑛 tosses are required in order that the
probability will be at least 0.90 for the proportion of heads to lie between 0.4
and 0.6 then the value of 𝑛 is at least
(a) 200
(b) 220
(c) 250
(d) 280
[Ans. (c)]
(vi) The 95% confidence limits for 𝜇, when sample is drawn from the population
𝑁(𝜇, 𝜎 2 ), 𝜎 2 is known, are given by
𝑥‾−𝜇
(a) −1.96 ≤ 𝜎 ≤ 1.96
√𝑛
𝑥‾−𝜇
(b) 𝑃 [−𝑍𝛼/2 ≤ 𝜎/ ≤ 𝑍𝛼/2 ] = 0.95 = 1 − 𝛼
√𝑛
𝜎
(c) 𝑥‾ ± 1.96
√𝑛
151 | P a g e
Σ𝑋
A. 𝑋‾ = 𝑁
B. 𝑋‾ = Σ𝑓𝑋/𝑁
Σ𝑑𝑥
C. 𝑋‾ = 𝐴 + 𝑁
Σ𝑓𝑑𝑥
D. 𝑋‾ = 𝐴 + 𝑁
Question 3. The variance of first 𝑛 natural numbers is:
A. (𝑛2 + 1)/12
B. (𝑛 + 1)2 /12
C. (𝑛2 − 1)/12
D. (2𝑛2 − 1)/12
Question 4. If a random variable 𝑋 has mean 3 and standard deviation 5, then the variance of
a variable 𝑌 = 2𝑋 − 5 is:
A. 45
152 | P a g e
B. 100
C. 15
D. 40
Question 5. For any two events 𝐴 and 𝐵 then 𝑃(𝐴 − 𝐵) is equal to:
A. 𝑃(𝐴) − 𝑃(𝐵)
B. 𝑃(𝐵) − 𝑃(𝐴)
C. 𝑃(𝐵) − 𝑃(𝐴𝐵)
D. 𝑃(𝐴) − 𝑃(𝐴𝐵)
Question 6. If an event 𝐵 has occurred and it is known that 𝑃(𝐵) = 1, the conditional
probability 𝑃(𝐴/𝐵) is equal to:
A. 𝑃(𝐴)
B. 𝑃(𝐵)
C. one
D. zero
Question 10. If 𝑋 is a random variable which can take only non-negative values, then
A. 𝐸(𝑋 2 ) = [𝐸(𝑋)]2
B. 𝐸(𝑋 2 ) ≥ [𝐸(𝑋)]2
C. 𝐸(𝑋 2 ) ≤ [𝐸(𝑋)]2
D. none of the above
Question 11. If 𝑋 is a random variable having its p.d.f. 𝑓(𝑥), the 𝐸(𝑋) is called:
A. arithmetic mean
B. geometric mean
C. harmonic mean
D. first quartile
1
Question 12. If 𝑋 is a random variable and 𝑓(𝑥) is itsp.d.f., 𝐸 (𝑋) is used to find:
A. arithmetic mean
B. harmonic mean
C. geometric mean
D. first central moment
Question 13. If 𝑋 is a random variable and its p.d.f. is 𝑓(𝑥), 𝐸(log 𝑥) used to find :
A. arithmetic mean
B. geometric mean
C. harmonic mean
D. logarithmic mean
1 1
Question 14. If 𝑋 ∼ 𝑏 (3, 2) and 𝑌 ∼ 𝑏 (5, 2), the probability of 𝑃(𝑋 + 𝑌 = 3) is:
A. 7/16
154 | P a g e
B. 7/32
C. 11/16
D. none of the above
Question 15. If 𝑋 and 𝑌 are two Poisson variates such 𝑋 ∼ 𝑃(1) and 𝑌 ∼ 𝑃(2), then X+Y
follow
A. P(1)
B. P(2)
C. P(4)
D. P(3)
Question 18. Student’s 𝑡 -distribution curve is symmetrical about mean, it means that:
A. odd order moments are zero
B. even order moments are zero
C. both (a) and (b)
D. none of (a) and (b)
Question 19. If 𝑋 ∼ 𝑁(0,1) and 𝑌 ∼ 𝜒 2 /𝑛, the distribution of the variate 𝑋/√𝑌 follows:
A. Cauchy’s distribution
155 | P a g e
B. Fisher’s 𝑡 -distribution
C. student’s 𝑡 -distribution
D. none of the above
Question 20. The relation between the mean and variance of 𝜒 2 with 𝑛 d.f. is:
A. mean = 2 variance
B. 2 mean = variance
C. mean = variance
D. none of the above
Question 21. Chi-square distribution curve in respect of symmetry is:
A. negatively skew
B. symmetrical
C. positively skew
D. any of the above
Question 23. If 𝑋 and 𝑌 are distributed as 𝜒 2 with d.f. 𝑛1 and 𝑛2 , respectively, the
distribution of the variate 𝑋/𝑌 is:
𝑛 𝑛
A. 𝛽𝐼 ( 21 , 22 )
𝑛 𝑛
B. 𝛽2 ( 21 , 22 )
C. 𝜒 2 with d.f. (𝑛1 − 𝑛2 )
D. none of the above
Question 24. If 𝑋 ∼ 𝜒𝑛21 and 𝑌 ∼ 𝜒𝑛22 , the distribution of the variate (𝑋 + 𝑌) is:
A. 𝜒 2 (𝑛1 − 𝑛2 )
B. 𝜒 2 (𝑛1 𝑛2 )
156 | P a g e
C. 𝜒 2 (𝑛1 + 𝑛2 )
D. all the above
Question 25. A normal random variable has mean = 2 and variance = 4. Its fourth central
moment 𝜇4 will be:
A. 16
B. 64
C. 80
D. 48
Question 26. If a random variable 𝑋 has mean 3 and standard deviation 5, then, the variance
of the variable 𝑌 = 2𝑋 − 5 is,
A. 25
B. 45
C. 100
D. 50
2 1
Question 27. A variable 𝑋 with moment generatingfunction 𝑀𝑋 (𝑡) = (3 + 3 𝑒 𝑡 ) is
distributed
with mean and variance as:
2 2
A. mean = 3 , variance = 9
1 2
B. mean = 3, variance = 9
1 2
C. mean = 3, variance = 3
2 1
D. mean = 3, variance = 9
Question 28. If a distribution has moment generating function 𝑀𝑋 (𝑡) = (2 − 𝑒 𝑡 )−3 , then the
distribution is:
A. geometric distribution
157 | P a g e
B. hypergeometric distribution
C. binomial distribution
D. negative binomial distribution
1
Question 29. If 𝑋 is a standard normal variate, then 2 𝑋 2 is a gamma variate with parameters:
1
A. 1, 2
1
B. ,1
2
1 1
C. ,
2 2
D. 1,1
Question 30. Given the joint probability density function of 𝑋 and 𝑌 as,𝑓(𝑥, 𝑦) = 4𝑥𝑦; 0 ≤
1 1
𝑥 ≤ 1,0 ≤ 𝑦 ≤ 1 = 0; otherwise. 𝑃 (0 < 𝑥 < 2 ; 2 ≤ 𝑦 ≤ 1) is equal to
A. 1/4
B. 5/16
C. 3/16
D. 3/8
Question 31. The type of estimates are
A. point estimate
B. interval estimates
C. estimation of confidence region
D. all the above
158 | P a g e
Question 37. If the variance of an estimator attains the Crammer-Rao lower bound, the
estimator is:
A. most efficient
B. sufficient
C. consistent
D. admissible
159 | P a g e
Question 38. Degrees of freedom for statistic- 𝜒 2 in case of contingency table of order
(2 × 2) is
A. 3
B. 4
C. 2
D. 1
Question 40. Standard error of the sample correlation coefficient 𝑟 based on 𝑛 paired values
is:
1+𝑟 2
A.
√𝑛
1−𝑟 2
B. 𝑛
1−𝑟 2
C.
√𝑛
160 | P a g e
B. 𝑌 = 0.64𝑋 + 10.612
C. 𝑌 = 0.4𝑋 + 12.82
D. none of the above
Question 43. If the two lines of regression in a bivariate distribution are 𝑋 + 9𝑌 = 7 and
𝑌 + 4𝑋 = 16 then 𝜎𝑋 : 𝜎𝑌 is:
A. 3: 2
B. 2: 3
C. 9: 4
D. 4: 9
Question 44. If a constant 5 is added to each observation of a set, the mean is:
A. increased by 5
B. decreased by 5
C. 5 times the original mean
D. not affected
Question 45. Which of the following relations among the location parameters does not hold?
A. Q₂ = median
B. P₅₀ = median
C. D₅ = median
D. D₆ = median\
Question 46. For random variables X and Y, we have Var(X)=1, Var(Y)=4, and Var(2X-3Y)
=34, then the correlation between X and Y is:
A. 1/2
B. 1/4
161 | P a g e
C. 1/3
D. None of the above
Question 47. Let X and Y be independent uniform (0, 1) random variables. Define A=X+Y
and B=X-Y. Then,
A. A and B are independent random variables
B. A and B are uncorrelated random variables
C. A and B are both uniforms (0,1) random variables.
D. None of these
6.5 SUMMARY
The main points which we have covered in this lessons are what is Interval Estimation and
how to get best interval estimator.
6.6 GLOSSARY
Motivation: These Problems are very useful in real life and we can use it in data science,
economics as well as social sciemce.
Attention: Think how the Interval Estimation are useful in real world problems.
6.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1 : A
Explanation-
1
As we know that simple formula for mean is 𝑋‾ = 𝑁 ∑ 𝑥𝑖
Answer 2 : C
Explanation-
As we know that the actual data is X and A is a point
Then, dx=X-A
Σdx 𝛴𝑋 𝛴𝐴
= -
𝑁 𝑁 𝑁
Σdx 𝑁
= 𝑋‾- A.𝑁
𝑁
Σ𝑑𝑥
𝑋‾ = 𝐴 +
𝑁
Answer 3 : C
Explanation-
162 | P a g e
1 2
𝑉(𝑥) = ∑𝑥 − (𝑥‾)2
𝑛 𝑖
1 + 2 + ⋯ + 𝑛 n(𝑛 + 1) 𝑛 + 1
we know 𝑥‾ = = = .
𝑛 2𝑛 2
1 2 2 2
𝑛+1 2
v(𝑥) = (1 + 2 + ⋯ + 𝑛 ) − ( )
𝑛 2
𝑛(𝑛 + 1)(2𝑛 + 1) 𝑛+1 2
= n( )−( )
6 2
(𝑛 + 1)(2𝑛 + 1) (𝑛 + 1)2
= −
6 4
2𝑛 + 𝑛 + 2𝑛 + 1 𝑛2 + 1 + 2𝑛
2
−
6 4
2 2
2𝑛 + 3𝑛 + 1 𝑛 + 1 + 2𝑛
−
6 4
4𝑛2 +6𝑛+2−3n²−3−6n
= 12
𝑛²−1
=V(x) = 12
Answer 4 : B
Explanation-
𝑋‾= 3, standard deviation = 5
V(y) = V (2 X - 5)
= 2² V(X)
= 2² * variance (x)
= 4 * 25
V(y)= 100
163 | P a g e
Answer 5 : D
Explanation-
𝑃(𝐴 − 𝐵) = 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵)
= 𝑃(𝐴) − 𝑃(𝐴𝐵)
Answer 6 : A
Explanation-
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴/𝐵) = = = 𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵) 1
as we know that 𝑃(𝐵) = 1
i. e 𝐵 = sample space
𝐴∩𝐵 =𝐴
ð𝑃(𝐴 ∩ 𝐵)= P(A)
Answer 7 : C
Explanation-
we know that 𝑥and 𝑦 are independent, then
𝐸(𝑥𝑦) = 𝐸(𝑥) ⋅ E(𝑦)
But Converse is not true
Answer 8 : A
Explanation-
As we know that 𝑋 and 𝑌 are non negative Random Variable with
𝑋≤𝑌
then 𝑋 − 𝑌 ≤ 0
𝐸(𝑋 − 𝑌) ≤ 0
𝐸(X) ≤ 𝐸(Y)
Answer 9: A
Explanation- we know that 𝐸((𝑥 − 𝑥‾)(𝑦 − y̅)) = con (𝑥, 𝑦) = 0
Answer 10 : B
Explanation-
We know that
𝑉(𝑥) = 𝐸(𝑥 2 ) − (𝐸(𝑥))2 ⩾ 0
164 | P a g e
so 𝐸(𝑥 2 ) ⩾ (𝐸(𝑥))2
Answer 11 : A
Explanation-
We know that,
𝐸(𝑥) = ∑𝑛𝑖=1 𝑥𝑖 𝑝(𝑥𝑖 ) = 𝑥‾
𝐸(𝑥) is arithmetic mean.
Answer 12 : B
Explanation-
We know that,
1 1
= ∫ 𝑓(𝑥)𝑑𝑥
𝐻 𝑥
1 1
So, 𝐻 = 𝐸 (𝑋)
1
H= 1
𝐸( )
𝑋
Answer 13 : B
Explanation-
We know that
∑log𝑥 𝑝(𝑥)
log 𝐺 = 𝐸(log 𝑥) =
𝑁
so 𝐸(log 𝑥) means log of geometric mean
Answer 14 : B
Explanation-
X~b(3,1/2) Y~b(5,1/2)
ðX+Y ~b(8,1/2)
So, 𝑃(𝑋 + 𝑌 = 3) = 8𝐶₃ (1/2)³ (1/2)⁵
= 8𝐶₃ (1/2)⁸
8!
= 5!3! * (1/2)⁸
165 | P a g e
8∗7∗6 1
= 3∗2∗1 * 2³ ∗ 2⁵
= 7/32
Answer 15 : D
Explanation-
𝑋 ∼ 𝑃(1), Y ∼ 𝑃(2)
then X + Y ∼ 𝑃(3)
Answer 16 : D
Explanation-
𝑋 ∼ Bin (𝑛, 𝑝)
Then Y = 𝑛 −X will follow Binomial (𝑛, 𝑞)
Answer 17 : C
Answer 18 : A
Explanation -
Symmetrical distribution means odd order moments will be zero.
Answer 19 : B
Explanation -
We know that
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑡𝑒
Fisher t-distribution is 𝑥² 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
√
𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
Answer 20 : B
Explanation -
In Chi Square distribution
E(X) = n , where X~Chi-square(n)
V(X) = 2n
So, V(X) = 2E(X)
Answer 21 : C
Explanation -
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 𝑛−(𝑛−2)
Skew = = = √2/𝑛>0
𝑆.𝐷 √2𝑛
166 | P a g e
Answer 23 : B
Explanation -
X~𝜒 2 (n₁)
Y~𝜒 2 (n₂)
𝑛 𝑛2
Then, 𝑋/𝑌~𝛽2 ( 21 , )
2
Answer 24 : C
Explanation -
X~𝜒 2 (n₁)
Y~𝜒 2 (n₂)
Then, (𝑋 + 𝑌)~𝜒 2 (𝑛1 + 𝑛2 )
Answer 25 : D
Explanation -
𝑋̅ = 2 σ²= 4
𝜇4 = 𝜇2.2 = 1.3σ²ⁿ = 1.3σ⁴= 3σ⁴ = 3*16 = 48
Answer 26 : C
Explanation -
𝑋̅ = 3
σ = 5
σ²= 4
V (4) = V (2 X -5) = 4V(X) = 4 X 25 =100
Answer 27: B
Explanation -
167 | P a g e
2 1
𝑀𝑋 (𝑡) = ( + 𝑒 𝑡 )
3 3
ðX~bernoulli(p)
X~bernoulli(1/3)
E(X)= 1/3, V(X) = 1/3 * 2/3 = 2/9
Answer 28 : D
Explanation -
−𝑟
𝑀𝑋 (𝑡) for negative binomial is of form (Q-𝑃𝑒 𝑡)
Answer 29 : A
Explanation -
X~N(0,1)
1 1
Then, 2 𝑋 2 ~ Gamma(1, 2)
Answer 30 : C
Explanation -
𝑓(𝑥, 𝑦) = 4𝑥𝑦; 0 ≤ 𝑥 ≤ 1,0 ≤ 𝑦 ≤ 1
1 1
𝑃 (0 < 𝑥 < ; ≤ 𝑦 ≤ 1)
2 2
1/2 1
=∫0 ∫1/2 4𝑥𝑦𝑑𝑦𝑑𝑥
1/2
=∫ 4𝑥(𝑦²/2)11/2 𝑑𝑥
0
1/2
=∫ 2𝑥(𝑦²)11/2 𝑑𝑥
0
1/2
=∫ 2𝑥(1 − 1/4) 𝑑𝑥
0
3 1/2
= ∫ 𝑥 𝑑𝑥
2 0
=
168 | P a g e
1/2
3 𝑛2
( )
2 2 0
169 | P a g e
1
S2 = ∑ (𝑋 − 𝑋‾)2
𝑛 𝑖 𝑖
1
And we know that, 𝑛 ∑𝑖 (𝑋𝑖 − 𝑋‾)2 is an U.E of 𝜎 2
𝑠² 1
So, n 𝑛−1 = 𝑛−1 ∑𝑖 (𝑋𝑖 − 𝑋‾)2 is an U.E of 𝜎 2
𝑠²
so, n𝑛−1is an U.E of 𝜎 2
Answer 37 : A
Explanation-
Crammer-Rao lower bound means an estimator which attains this lower bound will be called
as minimum variance bound (MVB) Estimator. So, efficient estimator attains Crammer Rao
Variance of lower bound.
Answer 38 : D
Explanation-
Degree of freedom in 𝜒 2 contingency table is (m-1)(n-1)
Here, m=2 , n=2
So degree of freedom = (2-1)(2-1) = 1
Answer 39 : B
Explanation-
We know that
𝑟 = √𝑏𝑌𝑋 ⋅ 𝑏𝑋𝑌
Correlation between X and Y is Geometric Mean of regression coefficients
This is called fundamental property of regression coefficient.
Answer 40 : C
Explanation-
1−𝑟 2
Standard error of the sample correlation coefficient =
√𝑛
Explanation-
If 𝑌 = 𝑚𝑋 + 4 Y ; regression line Y on X
170 | P a g e
Answer 42 : B
Explanation-
𝜇𝑋 = 9.2, 𝜇𝑌 = 16.5, 𝜎𝑋 = 2.1
𝜎𝑌 = 1.6 and 𝜌𝑋𝑌 = 0.84
the regression line of 𝑌 on 𝑋
𝜎𝑦
(Y. 𝑌̅) = r𝜎 (X-𝑋̅)
𝑥
𝜎𝑦
Y= 𝑌̅ + r (X-𝑋̅)
𝜎𝑥
1.6
= 16.5 + 0.84 +2.1 (X-9.2)
=16.5 + 0.84 X - 5.888
Y =0.64X + 10.612
Answer 43 : A
Explanation-
𝑋 + 9𝑌 = 7and𝑌 + 4𝑋 = 16
If we take X =7-9y
Y=16-4x
Then 𝑏𝑥𝑦 = -9 , 𝑏𝑦𝑥 = -4
r =√𝑏𝑥𝑦 . 𝑏𝑦𝑥 = √36 = 6 >1
171 | P a g e
Answer 44 : A
Explanation -
Let x₁ , x₂ ….. , 𝑥𝑛 be sample with frequency
f₁ , f₂ ….., 𝑓𝑛 then
1
𝑥= 𝑁 ∑ 𝑓𝑖𝑥𝑖
1
So 𝑥𝑛𝑒𝑤= ∑ 𝑓𝑖(𝑥𝑖 +5)
𝑁
1 1
= 𝑁 ∑ 𝑓𝑖(𝑥𝑖 +5) + 𝑁 ∑ 𝑓𝑖 . 5
172 | P a g e
5
= 𝑥 + 𝑁 ∑ 𝑓𝑖
5
=𝑥+𝑁.N
𝑥 𝑛𝑒𝑤 = 𝑥 + 5
Answer 45 : D
Explanation - As we know that median is a point which divide the distribution in two equal
parts
But D₆ is 6th decile so median ≠D₆
Answer 46 : B
Explanation-
Var(2X-3Y) = 34
= 4Var(X)+9Var(Y)-12Cov(X, Y)
= 4(1)+9(4)-12Cov(X, Y) = 34
∴ Cov(X, Y)=1/2
Answer 47 : B
Explanation-
Cov(X+Y, X-Y) = Cov(X, X) – Cov(X, Y) + Cov(Y, X) – Cov(Y ,Y) ⇒ Var(X) – Var(Y) =
0
6.8 REFERENCES
• Devore, J. (2012). Probability and statistics for engineers, 8th ed. Cengage Learning.
• John A. Rice (2007). Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Larsen, R., Marx, M. (2011). An introduction to mathematical statistics and its
applications. Prentice Hall.
6.9 SUGGESTED READINGS
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
173 | P a g e
LESSON 7
INTERVAL BASED DISTRIBUTION
STRUCTURE
7.1 Learning Objectives
7.2 Introduction
7.3 Interval Based Distribution
7.3.1 Student’s t- Distribution
7.3.2 F- Distribution
7.3.3 Z-Distribution
7.3.4 Chi-Square Distribution
7.4 In-Text Questions
7.5 Summary
7.6 Glossary
7.7 Answer to In-Text Questions
7.8 References
7.9 Suggested Readings
7.1 LEARNING OBJECTIVES
One of the main objectives to discuss these distribution( t , F , Z and Chi-Squared distribution)
is to understand sampling distribution i.e to understand sample based distributions.
7.2 INTRODUCTION
The entire large sample theory was based on the application of "Normal Test". However, if the
𝑥‾−𝜇
sample size 𝑛 is small, the distribution of the various statistics, e.g., 𝑍 = 𝜎/ 𝑛 or 𝑍 = (𝑋 −
√
𝑛𝑃)/√𝑛𝑃𝑄 etc., are far from normality and as such 'normal test' cannot be applied if 𝑛 is small.
In such cases exact sample tests, pioneered by W.S. Gosset (1908) who wrote under the pen
name of Student, and later developed and extended by Prof. R.A. Fisher (1926), are used. In
the following sections we shall discuss: (i) 𝑡-test, (ii) 𝐹-test, and (iii) Fisher's 𝑧-transformation.
The exact sample tests can, however, be applied to large samples also though the converse is
not true. In all the exact sample tests, the basic assumption is that "the population(s) from
which sample(s) is (are) normal, i.e., the parent population(s) is (are) normally distributed."
7.3 INTERVAL BASED DISTRIBUTION
We will discuss these distributions in details.
174 | P a g e
1 𝑛−1
of two independent 𝜒 2 -variates with 1 and (𝑛 − 1)𝑑.f. respectively, is a 𝛽2 (2 , ) variate
2
and its distribution is given by :
1
1 (𝑡 2 /𝑣)2−1
𝑑𝐹(𝑡) = ⋅ 𝑑(𝑡 2 /𝑣), 0 ≤ 𝑡 2 < ∞ [ where 𝑣 = (𝑛 − 1)]
1 𝑣 2 (𝑣+1)/2
𝐵 (2 , 2) (1 + 𝑡 )
𝑣
1 1
= ⋅ 𝑑𝑡; −∞ < 𝑡 < ∞
1 𝑣 𝑡 2 (𝑣+1)/2
√𝑣𝐵 (2 , 2) (1 + )
𝑣
the factor 2 disappearing since the integral from −∞ to ∞ must be unity. This is the required
probability density function as given in (16 ⋅ 2) of Student's t-distribution with 𝑣 = (𝑛 −
1)𝑑. 𝑓.
Remarks on Student's '𝒕'.
1. Importance of Student's 𝑡-distribution in Statistics. W.S. Gosset, who wrote under
pseudonym (pen-name) of Student defined his 𝑡 in a slightly different way, viz., 𝑡 = (𝑥 −
𝜇)/𝑠 and investigated its sampling distribution, somewhat empirically, in a paper entitled
'The Probable Error of the Mean', published in 1908. Prof. R.A. Fisher, later on defined
his own ' 𝑡 ' and gave a rigorous proof for its sampling distribution in 1926. The salient
feature of ' 𝑡 ' is that both the statistic and its sampling distribution are functionally
independent of 𝜎, the population standard deviation.
The discovery of ' 𝑡 ' is regarded as a landmark in the history of statistical inference. Before
𝑥‾−𝜇
Student gave his ' 𝑡 ', it was customary to replace 𝜎 2 in 𝑍 = 𝜎/ 𝑛, by its unbiased estimate
√
2 𝑥‾−𝜇
𝑆 to give 𝑡 = 𝑆/ and then normal test was applied even for small samples. It has been
√𝑛
found that although the distribution of 𝑡 is asymptotically normal for large 𝑛 , it is far from
normality for small samples. The Student's 𝑡 ushered in an era of exact sample
distributions (and tests) and since its discovery many important contributions have been
made towards the development and extension of small (exact) sample theory.
2. Confidence or Fiducial Limits for 𝝁. If 𝑡0015 is the tabulated value of 𝑡 for 𝑣 = (𝑛 −
1)𝑑. 𝑓. at 5% level of significance, i.e., 𝑃(|𝑡| > 𝑡005 ) = 0.05 ⇒ 𝑃(|𝑡| ≤ 𝑡0.05 ) = 0.95,
the 95% confidence limits for 𝜇 are given by :
𝑥‾ − 𝜇 𝑠 𝑆
|𝑡| ≤ 𝑡0.05 , i.e., | | ≤ 𝑡0.05 ⇒ 𝑥‾ − 𝑡0.05 ⋅ ≤ 𝜇 ≤ 𝑥‾ + 𝑡0.05
𝑆/√𝑛 √𝑛 √𝑛
Thus, 95% confidence limits for 𝜇 are : 𝑥‾ ± 𝑡0,05 ⋅ (𝑆/√𝑛
Similarly, 99% confidence limits for 𝜇 are : 𝑥‾ ± 𝑡0.01 (𝑆/√𝑛)
where 𝑡0.01 is the tabulated value of 𝑡 for 𝑣 = (𝑛 − 1) d.f. at 1% level of significance.
176 | P a g e
Fisher's ' 𝒕 ' (Definition). It is the ratio of a standard normal variate to the square root of an
independent chi-square variate divided by its degrees of freedom. If 𝜉 is a 𝑁(0,1) and 𝜒 2 is
an independent chi-square variate with 𝑛𝑑. 𝑓., then Fisher's 𝑡 is given by :
𝜉
𝑡=
√𝜒 2 /𝑛
and it follows Student's ' 𝑡 ' distribution with 𝑛 degrees of freedom.
Distribution of Fisher's ' 𝒕 '. Since 𝜉 and 𝜒 2 are independent, their joint probability
differential is given by :
𝑛
2)
1 2
exp (−𝜒 2 /2)(𝜒 2 ) 2−1
𝑑𝐹(𝜉, 𝜒 = exp (−𝜉 /2) 𝑑𝜉𝑑𝜒 2
√2𝜋 2𝑛/2 Γ(𝑛/2)
Let us transform to new variates 𝑡 and 𝑢 by the substitution :
𝜉
𝑡= and 𝑢 = 𝜒 2 ⇒ 𝜉 = 𝑡√𝑢/𝑛 and 𝜒 2 = 𝑢
√𝜒 2 /𝑛
Jacobian of transformation / is given by :
∂(𝜉, 𝜒 2 ) 𝑡/(2√𝑢𝑛)| = √𝑢
𝐽= = |√𝑢/𝑛
∂(𝑡, 𝑢) 0 1 𝑛
The joint p.d.f g(𝑡, 𝑢) of 𝑡 and 𝑢 becomes :
1 𝑢 𝑡2 𝑛 1
𝑔(𝑡, 𝑢) = exp {− (1 + )} 𝑢 2−2 𝑑𝑢
√2𝜋2𝑢/2 Γ(𝑛/2)√𝑛 2 𝑛
Since 𝜓2 ≥ 0 and −∞ < 𝜉 < ∞, 𝑢 ≥ 0 and −∞ < 𝑡 < ∞.
∞
1 𝑢 12
𝑔1 (𝑡) = [∫ exp {− (1 + )} 𝑢(𝑢−1)/2 𝑑𝑢]
√2𝜋2𝑢/2 Γ(𝑛/2)√𝑛 0 2 𝑛
Constants of 𝒕-Distribution. Since 𝑓(𝑡) is symmetrical about the line 𝑡 = 0, all the moments
of odd order about origin vanish, i.e.,
′
𝜇2𝑟+1 (about origin) = 0; 𝑟 = 0,1,2, …
In particular, 𝜇1′ (about origin ) = 0 = Mean
Hence central moments coincide with moments about origin.
∴ 𝜇2𝑟+1 = 0, (𝑟 = 0,1,2, … )
The moments of even order are given by :
177 | P a g e
∞ ∞
′
𝜇2𝑟 = 𝜇2𝑟 (about origin) = ∫ 𝑡 2𝑟 𝑓(𝑡)𝑑𝑡 = 2 ∫ 𝑡 2𝑟 𝑓(𝑡)𝑑𝑡
−∞ 0
∞ 2𝑟
1 𝑡 ⋅
=2⋅ ∫ 𝑑𝑡
1 𝑛 2 (𝑛+1)/2
𝐵 (2 , 2) √𝑛 0 (1 + 𝑡 )
𝑛
This integral is absolutely convergent if 2𝑟 < 𝑛.
𝑡2 1 𝑛(1−𝑦) 𝑛
Put 1 + = 𝑦 ⇒ 𝑡2 = ⇒ 2𝑡𝑑𝑡 = − 𝑦 2 𝑑𝑦
𝑛 𝑦
When 𝑡 = 0, 𝑦 = 1 and when 𝑡 = ∞, 𝑦 = 0. Therefore,
0
2 𝑡 2𝑟 −𝑛
𝜇2𝑟 = ∫ ⋅ 𝑑𝑦
1 𝑛 (1/𝑦)(𝑛+1)/2 2𝑡𝑦 2
√𝑛𝐵 (2 , 2) 1
1
𝑛
= ∫ (𝑡 2 )(2𝑟−1)/2 𝑦{(𝑛 + 1)/2} − 2𝑑𝑦
1 𝑛 0
√𝑛𝐵 (2 , 2)
1
√𝑛 1
1 − 𝑦 𝑟−2 |(𝑛+1)/2|−2
= ∫ [𝑛 ( )] 𝑦 𝑑𝑦
1 𝑛 𝑦
𝐵 (2 , 2) 0
1 𝑛
𝑛𝑟 1 𝑛𝑟 𝑛 1
= ∫ 𝑦 2−𝑟−1 (1 − 𝑦)𝑟−2 𝑑𝑦 = . 𝐵 ( − 𝑟, 𝑟 + ) , 𝑛 > 2𝑟.
1 𝑛 1 𝑛 2 2
𝐵 (2 , 2) 0 𝐵 (2 , 2)
1
Γ[(𝑛/2) − 𝑟]Γ (𝑟 + 2)
= 𝑛𝑟
Γ(1/2)Γ(𝑛/2)
1 3 31 1 𝑛
(𝑟 − 2) (𝑟 − 2) … 2 2 Γ (2) Γ (2 − 𝑟)
𝑟
=𝑛
Γ(1/2)[(𝑛/2) − 1][(𝑛/2) − 2] … [(𝑛/2) − 𝑟]Γ[(𝑛/2) − 𝑟]
(2𝑟 − 1)(2𝑟 − 3) … 3 ⋅ 1 𝑛
= 𝑛𝑟 , >𝑟
(𝑛 − 2)(𝑛 − 4) … (𝑛 − 2𝑟) 2
In particular
1 𝑛
𝜇2 = 𝑛 ⋅ = , (𝑛 > 2)
(𝑛 − 2) 𝑛 − 2
2
3⋅1 3𝑛2
𝜇4 = 𝑛 = , (𝑛 > 4)
(𝑛 − 2)(𝑛 − 4) (𝑛 − 2)(𝑛 − 4)
178 | P a g e
and
Hence
𝜇 2 𝜇 𝑛−2
𝛽1 = 𝜇3 3 = 0 and 𝛽2 = 𝜇 42 = 3 (𝑛−4) ; (𝑛 > 4).
2 2
𝑛−2 1−(2/𝑛)
Remarks 1. As 𝑛 → ∞, 𝛽1 = 0 and 𝛽2 = lim𝑛→∞ 3 (𝑛−4) = 3lim𝑛→∞ [1−(4/𝑛)] = 3
179 | P a g e
𝜋/2
1
∴ 𝑎𝑦0 ⋅ 2 ∫ cos2𝑚+1 𝜃sin0 𝜃𝑑𝜃 = 1 ⇒ 𝑎𝑦0 𝐵 (𝑚 + 1, ) = 1
0 2
1
⇒ 𝑦0 =
1
𝑎𝐵 (𝑚 + 1 ⋅ 2)
Since the given probability function is symmetrical about the line 𝑥 = 0, we have as in
′
𝜇2𝑟+1 = 𝜇2𝑟+1 = 0; 𝑟 = 0,1,2, … , [∵ Mean = Origin ] The moments of even order are given
by :
𝑎 𝑎 𝑚′′
′ 2𝑟 2𝑟
𝑥2
𝜇2𝑟 = 𝜇2𝑟 (about origin) = ∫ 𝑥 𝑓(𝑥)𝑑𝑥 = 𝑦0 ∫ 𝑥 (1 − 2 ) 𝑑𝑥
−𝑎 −𝑎 𝑎
𝑎 2 𝑚 𝜋/2
2𝑟
𝑥
= 2𝑦0 ∫ 𝑥 (1 − 2 ) 𝑑𝑥 = 2𝑦0 ∫ (𝑎sin 𝜃)2𝑟 cos2𝑚 𝜃 ⋅ 𝑎cos 𝜃𝑑𝜃, (𝑥 = 𝑎sin 𝜃)
0 𝑎 0
𝜋/2
2𝑟+1
1
= 𝑦0 𝑎 ⋅2∫ sin2𝑟 𝜃 ⋅ cos2𝑚+1 𝜃𝑑𝜃 = 𝑦0 𝑎2𝑟+1 𝐵 (𝑟 + , 𝑚 + 1) [Using (1)]
0 2
1 1 3
𝐵 (𝑟 + 2 , 𝑚 + 1) Γ (𝑟 + 2) Γ (𝑚 + 2)
= 𝑎2𝑟 = 𝑎2𝑟 ⋅
1 3 1
𝐵 (𝑚 + 1, ) Γ (𝑚 + 𝑟 + ) Γ ( )
2 2 2
1
2
Γ{𝑚+(3/2)}⋅ Γ(1/2)
2 𝑎2
In particular, 𝜇2 = 𝑎 ⋅ {𝑚+(3/2)}Γ∣𝑚+(3/2)}Γ(1/2) = 2𝑚+3
⇒ 𝑎2 = (2𝑚 + 3)𝜇2
Γ(5/2) Γ{𝑚+(3/2)} 3𝑎4
Also 𝜇4 = 𝑎4 Γ{𝑚+(7/2)∣ × = (2𝑚+5)(2𝑚+3) (On simplification)
Γ(1/2)
𝜇4 3(2𝑚+3) 9−5𝛽2
∴ 𝛽2 = 𝜇 2 = ⇒ 𝑚 = 2(𝛽 (On simplification) …
2 (2𝑚+5) 2 −3)
Equations (2), (3) and (4) express the constants 𝑦0 , 𝑎 and 𝑚 in terms of 𝜇2 and 𝛽2.
𝑎𝑡 𝑥2 𝑡2
𝑥= ⇒ 2=
[2(𝑚 + 1) + 𝑡 2 ]1/2 𝑎 2(𝑚 + 1) + 𝑡 2
−1
𝑥2 2(𝑚+1) 𝑡2
i.e., 1 − 𝑎2 = 2(𝑚+1)+𝑡 2 = (1 + 𝑛 ) , (𝑛 = 2𝑚 + 2)
𝑑𝑡 1 2𝑡𝑑𝑡 1 𝑡2
Also 𝑑𝑥 = 𝑎 [(𝑛+𝑡 2 )1/2 − 𝑡 ⋅ 2 (𝑛+𝑡 2 )3/2 ] = 𝑎 (𝑛+𝑡 2 )1/2 (1 − 𝑛+𝑡 2 ) 𝑑𝑡
180 | P a g e
𝑎𝑛 𝑎 1
= 𝑑𝑡 = ⋅ 𝑑𝑡
(𝑛 + 𝑡 2 )3/2 √11 [1 + (𝑡 2 /𝑛)]3/2
Hence the p.d.f. of 𝑋 transforms to
1 𝑎 𝑑𝑡
𝑑𝐹(𝑡) = 𝑦0 𝑚 ⋅ ⋅ 3/2
𝑡2 √𝑛 𝑡2
(1 + ) (1 + )
𝑛 𝑛
1 𝑎 𝑑𝑡
= ⋅ ⋅
1 2 𝑚+(3/2)
𝑎𝐵 (𝑚 + 1, 2) √11 (1 + 𝑡 )
𝑛
1 𝑑𝑡
= ⋅ , −∞ < 𝑡 < ∞
𝑛 1 𝑡 2 (𝑛+1)/2
√𝑛𝐵 (2 , 2) (1 + )
𝑛
181 | P a g e
𝑣 1 𝑣 1
⇒ 𝑥1 = 1+ , 𝑥2 = 1 −
2 𝑛 2 𝑛
√(1 + 2 ) √(1 + 2 )
[ 𝑢 ] [ 𝑢 ]
∂(𝑥1 ,𝑥2 ) 𝑣
Jacobian of transformation is: 𝐽 = = 3/2
∂(𝑢,𝑣) 𝑢2
2√𝑛(1+ )
𝑛
The joint p.d.f. of 𝑈 and 𝑉 becomes
1 𝑒 −𝑖/2 𝑣 𝑛−1
𝑔(𝑢, 𝑣) = 𝑝(𝑥1 , 𝑥2 )|𝐽| = (𝑛+1)/2
; −∞ < 𝑢 < ∞, 0 ≤ 𝑣
22𝑢−1 Γ(𝑛/2)Γ(𝑛/2)√𝑛 𝑢2
(1 + )
𝑛
<∞
Using Legender's duplication formula, viz.,
𝑛+1 Γ𝑛√𝜋
Γ𝑛 = 2𝑛−1 Γ(𝑛/2)Γ ( ) /√𝜋 ⇒ Γ(𝑛/2) = , we get
2 𝑛−1 𝑛+1
2 Γ( 2 )
22𝑛−1 ⋅ Γ𝑛√𝜋 𝑛 1 𝑛 1
22𝑛−1 Γ(𝑛/2)Γ(𝑛/2)√𝑛 = Γ ( ) √𝑛 = 2𝑛 Γ𝑛√𝑛𝐵 ( , ) [∵ √𝜋 = Γ ( )]
𝑛+1
2𝑛−1 Γ ( 2 ) 2 2 2 2
1 1 1
∴ 𝑔(𝑢, 𝑣) = ( 𝑛 𝑒 −∇/2 𝑣 𝑛−1 ) ⋅ ; 0 < 𝑣 < ∞, −∞ < 𝑢 < ∞.
2 Γ𝑛 1 𝑛 𝑢 2 (𝑛+1)/2
√ 𝑛𝐵 ( , )
[ 2 2 (1 + 𝑛 ) ]
⇒ 𝑔(𝑢, 𝑣) = 𝑔1 (𝑢)𝑔2 (𝑣),
1 1
where 𝑔1 (𝑢) = ⋅ ⋅ −∞ < 𝑢 < ∞
1 𝑛 2 (𝑛+1)/2
√𝑛𝛽 (2 , 2) (1 + 𝑢 )
𝑛
1
and 𝑔2 (𝑣) = 𝑛
𝑒 −𝑣/2 𝑣 𝑛−1 , 0 < 𝑣 < ∞
2 Γ𝑛
182 | P a g e
Example 4. Show that for t-distribution with 𝑛 d.f., mean deviation about mean is given by:
183 | P a g e
√𝑛Γ[(𝑛 − 1)/2]/√𝜋Γ(𝑛/2)
∞ 1 ∞ |𝑡|𝑑𝑡
Solution. 𝐸(𝑡) = 0. M.D. (about mean) = ∫−∞ |𝑡|𝑓(𝑡)𝑑𝑡 = 1𝑛 ∫−∞ (𝑛+1)/2
√𝑛𝐵(2, 2 ) 𝑡2
(1+ )
𝑛
∞ ∞
2 𝑡𝑑𝑡 √𝑛 𝑑𝑦 𝑡2
= ∫ = ∫ , ( = 𝑦)
1 𝑛 0 𝑡 2 (𝑛+1)/2 1 𝑛 0 (1 + 𝑦)(𝑛+1)/2 𝑛
√𝑛𝐵 (2 , 2) (1 + ) 𝐵 (2 , 2 )
𝑛
∞
√𝑛 𝑦 1−1 √𝑛 𝑛−1 √𝑛Γ[(𝑛 − 1)/2]
= ∫ 𝑛−1 𝑑𝑦 = ⋅𝐵( , 1) =
1 𝑛 1 𝑛 2 √𝜋Γ(𝑛/2)
𝐵 ( , ) 0 (1 + 𝑦) 2 +1 𝐵( , )
2 2 2 2
16.2-5. Limiting Form of t-distribution. As 𝑛 → ∞, the p.d.f. of 𝑡-distribution with 𝑛 d.f. viz.,
−(𝑛+1)/2
1 𝑡2 1 1
𝑓(𝑡) = (1 + ) → ⋅ exp (− 𝑡 2 ) , −∞ < 𝑡 < ∞
1 𝑛 𝑛 √2𝜋 2
√𝑛𝐵 (2 , 2)
1
1 1 Γ[(𝑛+1)/2] 1 1 𝑛 2 1
Proof. lim𝑛→∞ 1𝑛 = lim𝑛→∞ = ⋅ ( ) =
√𝑛𝐵(2, 2 ) √𝑛 Γ(1/2)Γ(𝑛/2) √𝑛 √𝜋 2 √2𝜋
Γ(𝑛 + 𝑘)
[∵ Γ(1/2) = √𝜋 and lim = 𝑛𝑘 , (𝑐. 𝑓. Remark to §16 ⋅ 8)]
𝑛→∞ Γ(𝑛)
1 1
𝑛 − −
Proof. 1 𝑡2 2
𝑡2 2
∴ lim 𝑓(𝑡) = lim ⋅ lim [(1 + ) ] × lim (1 + )
𝑛→∞ 𝑛→∞ 1 𝑛 𝑛→∞ 𝑛 𝑛→∞ 𝑛
√𝑛𝐵 (2 , 2)
1
= exp (−𝑡 2 /2), −∞ < 𝑡 < ∞
√2𝜋
Hence for large d.f. 𝑡-distribution tends to standard normal distribution. 16.2.6. Graph of t-
distribution. The p.d.f. of 𝑡-distribution with 𝑛 d.f. is :
−(𝑛+1)/2
𝑡2
𝑓(𝑡) = 𝐶 ⋅ (1 + ) , −∞ < 𝑡 < ∞
𝑛
Since 𝑓(−𝑡) = 𝑓(𝑡), the probability curve is symmetrical about the line 𝑡 = 0. As 𝑡 increases,
𝑓(𝑡) decreases rapidly and tends to zero as 𝑡 → ∞, so that 𝑡-axis is an asymptote to the curve.
We have shown that
𝑛 3(𝑛 − 2)
𝜇2 = , 𝑛 > 2; 𝛽2 = ,𝑛 > 4
𝑛−2 (𝑛 − 4)
184 | P a g e
Hence for 𝑛 > 2, 𝜇2 > 1 i.e., the variance of 𝑡-distribution is greater than that of standard
normal distribution and for 𝑛 > 4, 𝛽2 > 3 and thus 𝑡-distribution is more flat on the top than
the normal curve. In fact, for small 𝑛, we have
𝑃(|𝑡| ≥ 𝑡0 ) ≥ 𝑃(|𝑍| ≥ 𝑡0 ), 𝑍 ∼ 𝑁(0,1)
i.e., the tails of the t-distribution have a greater probability (area) than the tails of standard
normal distribution. Moreover we have also seen $$16 − 2.5], that for large 𝑛 (𝑑. 𝑓), t-
distribution tends to standard normal distribution.
Critical Values of 𝒕. The critical (or significant) values of 𝑡 at level of significance 𝛼 and
𝑑. 𝑓. 𝑣 for two-tailed test are given by the equation :
𝑃[|𝑡| > 𝑡𝑣 (𝛼)] = 𝛼
⇒ 𝑃[|𝑡| ≤ 𝑡𝑣 (𝛼)] = 1 − 𝛼
185 | P a g e
The values 𝑡𝑣 (𝛼) have been tabulated in Fisher and Yates' Tables, for different values of 𝛼
and 𝑣 and are given in Table I at the end of the chapter.
Since 𝑡-distribution is symmetric about 𝑡 = 0, we get from (16.5)
𝑃(𝑡 > 𝑡𝑣 (𝛼)] + 𝑃[𝑡 < −𝑡𝑣 (𝛼)] = 𝛼 ⇒ 2𝑃[𝑡 > 𝑡𝑣 (𝛼)] = 𝛼
⇒ 𝑃[𝑡 > 𝑡𝑣 (𝛼)] = 𝛼/2 ∴ 𝑃[𝑡 > 𝑡𝑣 (2𝛼)] = 𝛼
𝑡𝑣 (2𝛼) (from the Tables at the end of the chapter) gives the significant value of 𝑡 for a
single-tail test [Right-tail or Left-tail-since the distribution is symmetrical], at level of
significance 𝛼 and 𝑣𝑑. 𝑓.
Hence the significant values of 𝑡 at level of significance ' 𝛼 ' for a single-tailed test can be
obtained from those of two-tailed test by looking the values at level of significance 2𝛼.
For example,
𝑡8 (0.05) for single-tail test = 𝑡8 (0.10) for two-tail test = 1.86
𝑡8 (0.05) for single-tail test = 𝑡8 (0 − 10) for two-tail test
𝑡15 (0.01) for single-tail test = 𝑡15 (0.02) for two-tail test = 2.60.
APPLICATIONS OF t-DISTRIBUTION
The 𝑡-distribution has a wide number of applications in Statistics, some of which are
enumerated below.
(i) To test if the sample mean (𝑥‾) differs significantly from the hypothetical value 𝜇 of
the population mean :
(ii) To test the significance of the difference between two sample means.
(iii) To test the significance of an observed sample correlation coefficient and sample
regression coefficient.
(iv) To test the significance of observed partial correlation coefficient. In the following
sections we will discuss these applications in detail, one by one.
t-Test for Single Mean. Suppose we want to test :
(i) if a random sample 𝑥𝑖 (𝑖 = 1,2, … , 𝑛) of size 𝑛 has been drawn from a normal
population with a specified mean, say 𝜇0 , or
(ii) if the sample mean differs significantly from the hypothetical value 𝜇0 of the
population mean.
Under the null hypothesis, 𝐻0 :
(i) The sample has been drawn from the population with mean 𝜇0 or
186 | P a g e
(ii) there is no significant difference between the sample mean 𝑥‾ and the population mean
𝜇0 , the statistic
𝑥‾ − 𝜇0
𝑡=
𝑆/√𝑛
1 1
where 𝑥‾ = 𝑛 ∑𝑛𝑖=1 𝑥𝑖 and 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾)2, follows Student's 𝑡-distribution with (𝑛 −
1) d.f.
We now compare the calculated value of 𝑡 with the tabulated value at certain level of
significance. If calculated |𝑡| > tabulated 𝑡, null hypothesis is rejected and if calculated |𝑡| <
tabulated 𝑡, 𝐻0 may be accepted at the level of significance adopted.
Remarks 1. On computation of 𝑆 2 for numerical problems. If 𝑥‾ comes out in integers, the
formula (16.6𝑎) can be conveniently used for computing 𝑆 2 . However, if 𝑥‾ comes in fractions
then the formula (16.6𝑎) for computing 𝑆 2 is very cumbersome and is not recommended. In
that case, step deviation method, given below, is quite useful.
If we take 𝑑𝑖 = 𝑥𝑖 − 𝐴, where 𝐴 is any arbitrary number, then
1 1 (∑ 𝑥𝑖 )2
𝑆2 = [Σ(𝑥𝑖 − 𝑥‾)2 ] = [∑ 𝑥𝑖2 − ]
𝑛−1 𝑛−1 𝑛
1 2
(Σ𝑑𝑖 )2
= [∑ 𝑑𝑖 − ] ., since variance is independent of change of origin.
𝑛−1 𝑛
Σ𝑑1
Also, in this case 𝑥‾ = 𝐴 + .
𝑛
1
2. We know, the sample variance : 𝑠 2 = 𝑛 ∑𝑖 (𝑥𝑖 − 𝑥‾)2 ⇒ 𝑛𝑠 2 = (𝑛 − 1)𝑆 2
𝑆2 𝑠2
∴ = ⇒ 𝑛𝑠 2 = (𝑛 − 1)𝑆 2
𝑛 𝑛−1
Hence for numerical problems, the test statistic using it becomes
𝑥‾ − 𝜇0 𝑥‾ − 𝜇0
𝑡= = ∼ 𝑡𝑛−1
√𝑠 2 /𝑛 √𝑠 2 /(𝑛 − 1)
3. Assumption for Student's t-test. The following assumptions are made in the Student's t-
test :
(i) The parent population from which the sample is drawn is normal.
(ii) The sample observations are independent, i.e., the sample is random.
(iii) The population standard deviation 𝜎 is unknown.
Example 5. A machinist is making engine parts witls axle diameters of 0.700 inch. A random
sample of 10 parts shows a mean dianteler of 0.742 inch with a standard deviation of 0.040
187 | P a g e
inch. Compute the statistic you would use to test whether the toork is meeting the
specifications. Also state how you would proceed further.
Solution. Here we are given :
𝜇 = 0.700 inche, 𝑥‾ = 0.742 inche, 𝑠 = 0.040 inche and 𝑛 = 10
Null Hypothesis, 𝐻0 : 𝜇 = 0.700, i.e., the product is conforming to specifications.
Alternative Hypothesis, 𝐻1 : 𝜇 ≠ 0.700
𝑥‾−𝜇 𝑥‾−𝜇
Test Statistic. Under 𝐻0 , the test statistic is : 𝑡 = 2 = 2 ∼ 𝑡(𝑛−1)
√𝑆 /𝑛 √𝑠 /(𝑛−1)
√9(0.742 − 0.700)
∴ 𝑡= = 3.15
0.040
How to proceed further. Here the test statistic ' 𝑡 ' follows Student's 𝑡-distribution with 10 −
1 = 9 d.f. We will now compare this calculated value with the tabulated value of 𝑡 for 9 d.f.
and at certain level of significance, say 5%. Let this tabulated value be denoted by 𝑡0 .
(i) If calculated ' 𝑡 ', viz., 3.15 > 𝑡0, we say that the value of 𝑡 is significant. This implies
that 𝑥‾ differs significantly from 𝜇 and 𝐻0 is rejected at this level of significance and we
conclude that the product is not meeting the specifications.
(ii) If calculated 𝑡 < 𝑡0, we say that the value of 𝑡 is not significant, i.e., there is no
significant difference between 𝑥‾ and 𝜇. In other words, the deviation (𝑥‾ − 𝜇) is just due
to fluctuations of sampling and null hypothesis 𝐻0 may be retained at 5% level of
significance, i.e., we may take the product conforming to specifications.
Example 6. The mean weekly sales of soap bars in departmental stores was 146.3 bars per
store. After an advertising campaign the mean weekly sales in 22 stores for a typical week
increased to 153.7 and showed a standard deviation of 17.2 . Was the advertising campaign
successful?
Solution. We are given : 𝑛 = 22, 𝑥‾ = 153 ⋅ 7, 𝑠 = 17 ⋅ 2.
Null Hypothesis. The advertising campaign is not successful, i.e., 𝐻0 : 𝜇 = 146 ⋅ 3
Alternative Hypothesis, 𝐻1 : 𝜇 > 146 ⋅ 3 (Right-tail).
𝑥‾−𝜇
Test Statistic. Under 𝐻0 , the test statistic is : 𝑡 = 2 ∼ 𝑡22−1 = 𝑡21 ∴ 𝑡 =
√𝑠 /(𝑛−1)
153⋅7−146⋅3 7.4×√21
= = 9 ⋅ 03
√(17⋅2)2 /21 17⋅2
Conclusion
Tabulated value of 𝑡 for 21𝑑. 𝑓. at 5% level of significance for singletailed test is 1.72 . Since
calculated value is much greater than the tabulated value, it is highly significant . Hence we
reject the null hypothesis and conclude that the advertising campaign was definitely successful
in promoting Scales.
188 | P a g e
1 2
𝑆2 = [∑ (𝑥𝑖 − 𝑥‾)2 + ∑ (𝑦𝑗 − 𝑦‾) ]
𝑛1 + 𝑛2 − 2
𝑖 𝑗
where and
is an unbiased estimate of the common population variance 𝜎 2 , follows Student's
tdistribution with (𝑛1 + 𝑛2 − 2)𝑑. 𝑓.
Proof. Distribution of 𝑡 defined in (16.7).
(𝑥‾ − 𝑦‾) − 𝐸(𝑥‾ − 𝑦‾)
𝜉= ∼ 𝑁(0,1)
√𝑉(𝑥‾ − 𝑦‾)
But 𝐸(𝑥‾ − 𝑦‾) = 𝐸(𝑥‾) − 𝐸(𝑦‾) = 𝜇𝑋 − 𝜇𝑌
𝜎𝑋2 𝜎𝑌2 1 1
𝑉(𝑥‾ − 𝑦‾) = 𝑉(𝑥‾) + 𝑉(𝑦‾) = + = 𝜎2 ( + )
𝑛1 𝑛2 𝑛1 𝑛2
(By assumption)
[The covariance term vanishes since samples are independent.]
(𝑥‾ − 𝑦‾) − (𝜇𝑥 − 𝜇𝛾 )
∴ 𝜉= ∼ 𝑁(0,1)
1 1
√𝜎 2 (
𝑛1 𝑛2 )
+
1 𝑛 𝑛 2
Let 𝜒 2 = 𝜎2 [∑𝑖=1
1
(𝑥𝑖 − 𝑥‾)2 + ∑𝑗=1
2
(𝑦𝑗 − 𝑦‾) ]
189 | P a g e
2 𝑛1 𝑠𝑥2 𝑛2 𝑠𝑦2
= [∑ (𝑥𝑖 − 𝑥‾)2 /𝜎 2 ] + [∑ (𝑦𝑗 − 𝑦‾) /𝜎 2 ] = + 2
𝜎2 𝜎
𝑖 𝑗
Since 𝑛1 𝑠𝑥 2 /𝜎 2 and 𝑛2 𝑠𝑌 2 /𝜎 2 are independent 𝜒 2 -variates with (𝑛1 − 1) and (𝑛2 − 1)𝑑. 𝑓.
respectively, by the additive property of chi-square distribution, 𝜒 2 defined in ( ∗∗ ) is a 𝜒 2 -
variate with (𝑛1 − 1) + (𝑛2 − 1), i.e., 𝑛1 + 𝑛2 − 2𝑑.f. Further, since sample mean and sample
variance are independently distributed, 𝜉 and 𝜒 2 are independent random variables. Hence
Fisher's 𝑡 statistic is given by
𝜉
𝑡=
𝜒2
√
𝑛1 + 𝑛2 − 2
190 | P a g e
Remarks 1. The sampling distribution of 𝐹-statistic does not involve any population
parameters and depends only on the degrees of freedom 𝑣1 and 𝑣2 .
Remarks 2. A statistic 𝐹 following Snedecor's 𝐹-distribution with (𝑣1 , 𝑣2 ) d.f. will be
denoted by 𝐹 ∼ 𝐹(𝑣1 , 𝑣2 )
191 | P a g e
𝑣 (𝑣1 /2)−1
1 (𝑣1 𝐹) 𝑣1
2
𝑑𝑃(𝐹) = 𝑣 𝑣 ⋅ (𝑣 +𝑣 )/2
𝑑 ( 𝐹)
𝐵 ( 1 , 2 ) (1 + 𝑣1 𝐹) 1 2 𝑣2
2 2 𝑣 2
𝑣 𝑣1 /2
( 1) 𝐹 (𝑣1/2)−1
𝑣2
⇒ 𝑓(𝐹) = 𝑣 𝑣 ⋅ (𝑣 +𝑣 //2
,0 ≤ 𝐹 < ∞
𝐵 ( 21 , 22 ) (1 + 𝑣1 𝐹) 1 2
𝑣 2
Constants of F-Distribution.
∞
𝜇𝑟′ (about origin) = 𝐸(𝐹 𝑟 ) = ∫ 𝐹 𝑟 𝑓(𝐹)𝑑𝐹
0
∞ (𝑣1 /2)−1
(𝑣1 /𝑣2 )𝑣1/2 𝑟
𝐹
= 𝑣1 𝑣2 ∫ 𝐹 (𝑣1 +𝑣2 )/2
𝑑𝐹
( , ) 0 𝑣1
𝐵 2 2 (1 + 𝑣 𝐹)
2
𝑣1 𝑣2
To evaluate the integral, put: 𝑣 𝐹 = 𝑦, so that 𝑑𝐹 = 𝑣 𝑑𝑦
2 1
192 | P a g e
𝑣 𝑟+(𝑣1 /2)−1
𝑣1 /2 ∞ ( 2 𝑦)
[𝑣 1 /𝑣2 1] 𝑣1 𝑣2
𝜇𝑟′ = 𝑣1 𝑣2 ∫ (1 + 𝑦)(𝑣1+𝑣2)/2 (𝑣1 ) 𝑑𝑦
𝐵(2 , 2) 0
𝑣 𝑟
(𝑣2 ) ∞
𝑦 𝑟+(𝑣1/2)−1
1
= 𝑣 𝑣 ∫ (𝑣
𝑑𝑦
𝐵 ( 21 , 22 ) 0 (1 + 𝑦) 1/2)+𝑟+((𝑣2/2)−𝑟
𝑣2 𝑟 1 𝑣1 𝑣2
=( ) ⋅ 𝑣 𝑣 ⋅ 𝐵 (𝑟 + , − 𝑟) , 𝑣2 > 2𝑟
𝑣1 𝐵 ( 21 , 22 ) 2 2
𝑣
Aliter we could also be obtained by substituting 𝑣1 𝐹 = tan2 𝜃 and using the Beta integral:
2
𝜋/2 𝑝+1 𝑞+1
2∫0 sin𝜇 ′
𝜃cos 𝜃𝑑𝜃 = 𝐵 ( , )
2 2
193 | P a g e
𝑣1 + 𝑣2
log 𝑓(𝐹) = 𝐶 + {(𝑣1 /2) − 1}log 𝐹 − ( ) log {1 + (𝑣1 /𝑣2 )𝐹},
2
where 𝐶 is a constant independent of 𝐹.
∂ 𝑣1 1 (𝑣1 + 𝑣2 ) 1 𝑣1
[log 𝑓(𝐹)] = ( − 1) ⋅ − ⋅ 𝑣 ⋅
∂𝐹 2 𝐹 2 (1 + 𝑣1 𝐹) 𝑣2
2
∂ 𝑣1 − 2 𝑣 1 (𝑣1 + 𝑣2 )
𝑓 ′ (𝐹) = 𝑓(𝐹) = 0 ⇒ − =0
∂𝐹 2𝐹 2(𝑣2 + 𝑣1 𝐹)
Hence
𝑣2 (𝑣1 − 2)
𝐹=
𝑣1 (𝑣2 + 2)
𝑣 (𝑣 −2)
It can be easily verified that at this point 𝑓 ′′ (𝐹) < 0. Hence mode = 𝑣2 (𝑣1+2)
1 2
194 | P a g e
To test if an observed value of ' 𝑟 ' differs significantly from a hypothetical value 𝜌 of the
population correlation coefficient.
𝐻0 : There is no significant difference between 𝑟 and 𝜌. In other words, the given santple has
been drawn from a bivariate normal population with correlation coefficient 𝜌. If we take 𝑍 =
1 1
log 𝑒 {(1 + 𝑟)/(1 − 𝑟)} and 𝜉 = log 𝑒 {(1 + 𝜌)/(1 − 𝜌)}, then under 𝐻0 , 𝑍 ∼
2 2
1 𝑍−𝜉
𝑁 (𝜉, 𝑛−3) ⇒ ∼ 𝑁(0,1)
√1/(𝑛−3)
Thus if (𝑍 − 𝜉)√(𝑛 − 3) > 1.96, 𝐻0 is rejected at 5% level of significance and if it is
greater than 2.58, 𝐻0 is rejected at 1% level of significance.
Remark.
Z defined in given equation should not be confused with the Z used in Fisher's 𝑧-distribution .
Example 16.29. A correlation coefficient of 0.72 is obtained from a sample of 29 pairs of
observations.
(i) Can the sample be regarded as drawn from a bivariate normal population in which true
correlation coefficient is 0.8 ?
(ii) Obtain 95% confidence limits for 𝜌 in the light of the information provided by the
sample.
Solution. (i) 𝐻0 : There is no significant difference between 𝑟 = 0.72; and 𝜌 = 0.80, i.e., the
sample can be regarded as drawn from the bivariate normal population with 𝜌 = 0.8. Here
1 1+𝑟 1+𝑟
𝑍 = log 𝑒 ( ) = 1 ⋅ 1513log10 ( ) = 1 ⋅ 1513log10 6.14 = 0.907
2 1−𝑟 1−𝑟
1 1+𝜌 (1 + 0.8)
𝜉 = log 𝑐 ( ) = 1.1513log10 = 1.1513 × 0.9541 = 1.1
2 1−𝜌 (1 − 0.8)
1 1
S.E. (𝑍) = = = 0.196
√𝑛 − 3 √26
𝑍−𝜉
Under 𝐻0 , the test statistic is: 𝑈 = 1/√𝑛−3 ∼ 𝑁(0,1)
(0.907 − 1.100)
∴ 𝑈= = −0.985
0.196
Since |𝑈| < 1.96, it is not significant at 5% level of significance and 𝐻0 may be accepted.
Hence the sample may be regarded as coming from a bivariate normal population with 𝜌 =
0 ⋅ 8.
(ii) 95% confidence limits for 𝜌 on the basis of the information supplied by the sample, are
given by:
195 | P a g e
1
|𝑈| ≤ 1.96 ⇒ |𝑍 − 𝜉| ≤ 1.96 × = 1.96 × 0.196
√𝑛 − 3
⇒ |0.907 − 𝜉| ≤ 0.384 or 0.907 − 0.384 ≤ 𝜉 ≤ 0.907 + 0.384
1 1+𝜌
⇒ 0.523 ≤ 𝜉 ≤ 1.291 or 0.523 ≤ log 𝑒 ( ) ≤ 1.291
2 1−𝜌
1+𝜌 0.523 1+𝜌 1.291
⇒ 0.523 ≤ 1.1513log10 ( ) ≤ 1.291 or ≤ log10 ( )≤
1−𝜌 1.1513 1−𝜌 1.1513
1+𝜌
∴ 0.4543 ≤ log10 ( ) ≤ 1.1213
1−𝜌
1+𝜌
∴ 0.4543 ≤ log10 ( ) ≤ 1.1213
1−𝜌
1+𝜌
Now log10 (1−𝜌) = 0.4543
1+𝜌 1+𝜌
⇒ = Antilog (0.4543) = 2.846 ⇒ = Antilog (1.1213) = 13.22
1−𝜌 1−𝜌
2 ⋅ 846 − 1 1 ⋅ 846 13 ⋅ 22 − 1 12 ⋅ 22
∴ 𝜌= = = 0.4799 ∴ 𝜌 = = = 0.86
2 ⋅ 846 + 1 3 ⋅ 846 13 ⋅ 22 + 1 14 ⋅ 22
Hence, substituting in ( ∗ ), we get 0.48 ≤ 𝜌 ≤ 0.86
(2) To test the significance of the difference between two independent sample correlation
coefficients. Let 𝑟1 and 𝑟2 be the sample correlation coefficients observed in two independent
1+𝑟 1 1+𝑟
samples of sizes 𝑛1 and 𝑛2 respectively, then, 𝑍1 = log 𝑐 (1−𝑟1) and 𝑍2 = 2 log 𝑐 (1−𝑟2)
1 2
Under the null hypothesis, 𝐻0 : that sample correlation coefficients do not differ
significantly, i.e., the samples are drawn from the same bivariate normal population or from
different populations with same correlation coefficient 𝜌, (say), the statistic:
(𝑍1 − 𝑍2 ) − 𝐸(𝑍1 − 𝑍2 )
𝑍= ∼ 𝑁(0,1)
S.E. (𝑍1 − 𝑍2 )
𝐸(𝑍1 − 𝑍2 ) = 𝐸(𝑍1 ) − 𝐸(𝑍2 ) = 𝜉1 − 𝜉2 = 0
1 1+𝜌
[∵ 𝜉1 = 𝜉2 = log 𝑟 ( under 𝐻0 )]
2 1−𝜌
1 1
and S.E. (𝑍1 − 𝑍2 ) = √𝑉(𝑍1 ) + 𝑉(𝑍2 ) = √{ + }
𝑛1 −3 𝑛2 −3
Under 𝐻0 , the test statistic is :
𝑍1 −𝑍2
[Covariance term vanishes since samples are independent.] 𝑍 = 1 1
∼ 𝑁(0,1)
√{𝑛 −3+𝑛 −3}
1 2
196 | P a g e
By comparing this value with 1.96 or 2.58, 𝐻0 may be accepted or rejected at 5% and 1%
levels of significance respectively.
(3) To obtain pooled estimate of 𝜌. Let 𝑟1 , 𝑟2 , … , 𝑟𝑘 be observed correlation coefficients in 𝑘-
independent samples of sizes 𝑛1 , 𝑛2 , … , 𝑛𝑘 respectively from a bivariate normal population.
The problem is to combine these estimates of 𝜌 to get a pooled estimate for the parameter. If
we take
1 1+𝑟
𝑍𝑖 = 2 log 𝑒 (1−𝑟𝑖) ; 𝑖 = 1,2, … , 𝑘; then 𝑍𝑖 ; 𝑖 = 1,2, … , 𝑘 are independent normal variates with
𝑖
1 1 1+𝜌
variances (𝑛 ; 𝑖 = 1,2, … , 𝑘 and common mean 𝜉 = log 𝑐 ( ).
𝑖 −3) 2 1−𝜌
‾
The weighted mean, (say 𝑍 ), of these 𝑍𝑖 's is given by :
𝑍‾ = ∑𝑘𝑖=1 𝑤𝑖 𝑍𝑖 /∑𝑘𝑖=1 𝑤𝑖 , where 𝑤𝑖 is the weight of 𝑍𝑖 .
Now 𝑍‾ is also an unbiased estimate of 𝜉, since
𝑘
1 1 1
𝐸(𝑍‾) = (𝐸 ∑ 𝑤i 𝑍𝑖 ) = [∑ 𝑤𝑖 𝐸(𝑍𝑖 )] = (∑ 𝑤𝑖 𝜉) = 𝜉
∑ 𝑤𝑖 ∑ 𝑤𝑖 ∑ 𝑤𝑖
𝑖=1 𝑖
1 1
𝑉(𝑍‾) = 𝑉 (∑ 𝑤𝑖 𝑍𝑖 ) = [∑ 𝑤𝑖2 𝑉(𝑍𝑖 )]
(∑ 𝑤𝑖 )2 (∑ 𝑤𝑖 )2
The weights 𝑤𝑖′ s, (𝑖 = 1,2, … , 𝑛) are so chosen that 𝑍‾ has minimum variance.
In order that 𝑉(𝑍‾) is minimum for variations in 𝑤𝑖 , we should have
∂
𝑉(𝑍‾) = 0; 𝑖 = 1,2, … , 𝑘.
∂𝑤𝑖
(∑ 𝑤𝑖 )2 2𝑤𝑖 𝑉(𝑍𝑖 ) − [∑ 𝑤𝑖2 𝑉(𝑍𝑖 )]2(∑ 𝑣𝑖 ) ∑ 𝑤𝑖2 𝑉(𝑍𝑖 )
⇒ = 0 or 𝑤 𝑖 𝑉(𝑍𝑖 ) = , a constant.
(∑ 𝑤𝑖 )4 ∑ 𝑤𝑖
1
∴ 𝑤𝑖 ∝ = (𝑛𝑖 − 3); 𝑖 = 1,2, … , 𝑘.
𝑉(𝑍𝑖 )
∑𝑘
𝑖=1 𝑤𝑖 𝑍𝑖 ∑𝑘
𝑖=1 (𝑛𝑖 −3)𝑍𝑖
Hence the minimum variance estimate of 𝜉 is given by : 𝑍‾ = ∑𝑘
= ∑𝑘
𝑖=1 𝑤𝑖 𝑖=1 (𝑛𝑖 −3)
197 | P a g e
1
∑ {(𝑛𝑖 − 3)2 (
𝑛𝑖 − 3)} ∑ (𝑛𝑖 − 3) 1
[𝑉(𝑍‾)]min = 2
= 2
= 𝑘
|∑ (𝑛𝑖 − 3)| |∑ (𝑛𝑖 − 3)| ∑𝑖=1 (𝑛𝑖 − 3)
7.3.4 Chi-Square Distribution
The square of a standard normal variate is known as a chi-square variate (pronounced as Ki-
Sky without S) with 1 degree of freedom (d.f.).
𝑋−𝜇 𝑋−𝜇 2
Thus if 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then 𝑍 = 𝜎 ∼ 𝑁(0,1) and 𝑍 2 = ( 𝜎 ) is a chi-square variate with
1 d.f.
In general if 𝑋𝑖 , (𝑖 = 1,2, … , 𝑛) are 𝑛 independent normal variates with means 𝜇𝑖 and
variances 𝜎𝑖2 , (𝑖 = 1,2, … , 𝑛), then
𝑋𝑖 −𝜇𝑖 2
𝜒 2 = ∑𝑛𝑖=1 ( ) , is a chi-square variate with 𝑛 d.f.
𝜎𝑖
198 | P a g e
1 1
which is the m.g.f. of a Gamma variate with parameters 2 and 2 𝑛.
Hence, by uniqueness theorem of 𝑚. 𝑔. 𝑓. 's,
𝑋− −𝜇 2 1 1
𝜒 2 = ∑𝑛𝑖 ( ) , is a Gamma variate with parameters 2 and 2 𝑛.
𝜎𝑖
1 𝑛/2
(2) 1
∴ 𝑑𝑃(𝜒 2 ) = ⋅ [exp (− 𝜒 2 )] (𝜒 2 )(𝑛/2)−1 𝑑𝜒 2
Γ(𝑛/2) 2
1
= 𝑛/2 [exp (−𝜒 2 /2)](𝜒 2 )(𝑛/2)−1 𝑑𝜒 2 , 0 ≤ 𝜒 2 < ∞
2 Γ(𝑛/2)
199 | P a g e
∂𝑥1 ∂𝑥2
cos 𝜃 sin 𝜃
𝐽 = | ∂𝑟 ∂𝑟 | = | |=𝑟
∂𝑥1 ∂𝑥2 −𝑟sin 𝜃 𝑟cos 𝜃
∂𝜃 ∂𝜃
2 2 2
Also, we have 𝑟 = 𝑥1 + 𝑥2 and tan 𝜃 = 𝑥2 /𝑥1 . As 𝑥1 and 𝑥2 range from −∞ to +∞, 𝑟
varies from 0 to ∞ and 𝜃 from 0 to 2𝜋. The joint probability differential of 𝑟 and 𝜃 now
1
becomes 𝑑𝐺(𝑟, 𝜃) = 2𝜋 exp (−𝑟 2 /2)𝑟𝑑𝑟𝑑𝜃; 0 ≤ 𝑟 ≤ ∞, 0 ≤ 𝜃 ≤ 2𝜋
Integrating over 𝜃, the marginal distribution of 𝑟 is given by:
2𝜋
𝜃 2𝜋
𝑑𝐺1 (𝑟) =∫ 𝑑𝐺(𝑟, 𝜃) = 𝑟exp (−𝑟 2 /2)𝑑𝑟 | | = exp (−𝑟 2 /2)𝑟𝑑𝑟
0 2𝜋 0
1 1
⇒ 𝑑𝐺1 (𝑟 2 ) = exp (−𝑟 2 /2)𝑑𝑟 2 = exp (−𝑟 2 /2)(𝑟 2 /2)1−1 𝑑(𝑟 2 /2)
2 Γ(1)
𝑟2 𝑋 2 +𝑋 2
Thus 2 = 1 2 2 is a 𝛾(1) variate and hence 𝑟 2 = 𝑋12 + 𝑋22 is a 𝜒 2 -variate with 2 d.f. For 𝑛
variables 𝑋𝑖 , (𝑖 = 1,2, … , 𝑛), we transform (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) to (𝑥, 𝜃1 , 𝜃2 , … , 𝜃𝑛−1 ); (1 − 1
transformation ) by:
𝑥1 = 𝜒cos 𝜃1 cos 𝜃2 … cos 𝜃𝑛−1
𝑥2 = 𝜒cos 𝜃1 cos 𝜃2 … cos 𝜃𝑛−2 sin 𝜃𝑛−1
𝑥3 = 𝜒cos 𝜃1 cos 𝜃2 … cos 𝜃𝑛−3 sin 𝜃𝑛−2
⋮
𝑥𝑗 = 𝜒cos 𝜃1 cos 𝜃2 … cos 𝜃𝑛−1 sin 𝜃𝑛−1+1
𝑥𝑛 = 𝜒sin 𝜃1 }
1 1 1
where 𝜒 > 0, −𝜋 < 𝜃1 < 𝜋 and − 2 𝜋 < 𝜃1 < 2 𝜋; for 𝑖 = 2,3, … 2 (𝑛 − 1).
Then 𝑥12 + 𝑥22 + ⋯ + 𝑥𝑛2 = 𝜒 2 and |𝐽| = 𝜒 𝑛−1 cos 𝑛−2 𝜃1 cos𝑛−3 𝜃2 … cos 𝜃𝑛−2 (c.f.
Advanced Theory of Statistics Vol. 1, by Kendall and Stuart.) The joint distribution of
𝑋1 , 𝑋2 , … , 𝑋𝑛 , viz.,
𝑛 𝑛
1
𝑑𝐹(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = ( ) exp (− ∑ 𝑥𝑖2 /2) ∏
𝑑𝑥𝑖 transforms to
√2𝜋 𝑖=1
1
𝑑𝐺(𝜒, 𝜃1 , 𝜃2 , … , 𝜃𝑛−1 ) = exp (− 𝜒 2 ) 𝜒 𝑛−1 cos𝑛−2 𝜃1 cos 𝑛−3 𝜃2 … cos 𝜃𝑛−2 𝑑𝜒𝑑𝜃1 𝑑𝜃2 … 𝑑𝜃𝑛−1
2
Integrating over 𝜃1 , 𝜃2 , … , 𝜃𝑛−1, we get the distribution of 𝜒 2 as :
𝑑𝑃(𝜒 2 ) = 𝑘exp (−𝜒 2 /2)(𝜒 2 )(𝑛/2)−1 𝑑𝜒 2 , 0 ≤ 𝜒 2 < ∞
The constant 𝑘 is determined from the fact that total probability is unity, i.e.
200 | P a g e
∞ ∞ 𝑛 1
∫ 𝑑𝑃(𝜒 2)
= 1 ⇒ 𝑘 ∫ exp (−𝜒 2 /2)(𝜒 2 ) 2−1 𝑑𝜒 2 = 1 ⇒ 𝑘 =
0 0 2𝑛/2 Γ(𝑛/2)
1 𝑛
∴ 𝑑𝑃(𝜒 2 ) = 𝑛/2 exp (−𝜒 2 /2)(𝜒 2 ) 2−1 , 0 ≤ 𝜒 2 < ∞
2 Γ(𝑛/2)
1 1
Hence 2 𝜒 2 = 2 ∑𝑛𝑖=1 𝑋𝑖2 is a 𝛾(𝑛/2) variate ⇒ 𝜒 2 = ∑𝑛𝑖=1 𝑋𝑖2 is a chi-square variate with 𝑛
degrees of freedom (d.f.) and (15.2) gives p.d.f. of chi-square distribution with 𝑛 d.f.
Remarks
1 . If 𝑋𝑖 ; 𝑖 = 1,2, … , 𝑛 are 𝑛 independent normal variates with mean 𝜇𝑖 and S.D. 𝜎𝑖 , then
𝑋𝑖 −𝜇𝑖 2
∑𝑛𝑖=1 ( ) is a 𝜒 2 -variate with 𝑛 d.f.
𝜎𝑖
2. In random sampling from a normal population with mean 𝜇 and S.D. 𝜎, 𝑥‾ is distributed
normally about the mean 𝜇 with S.D. 𝜎/√𝑛.
2
𝑋‾ − 𝜇 𝑥‾ − 𝜇
∴ ∼ 𝑁(0,1) ⇒ [ ] is a 𝜒 2 -variate with 1 d.f.
𝜎/√𝑛 𝜎/√𝑛
201 | P a g e
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
( + 1) ( + 1) ( + 2) … (
𝑀(𝑡)
𝑛
= 1 + (2𝑡) + 2 2 (2𝑡)2 + ⋯ + 2 2 2 2 + 𝑟 − 1) (2𝑡)𝑟 + ⋯
2 2! 𝑟!
𝑡′
𝜇𝑟′ ′ = Coefficient of in the expansion of 𝑀(𝑡)
𝑟!
𝑛 𝑛 𝑛 𝑛
= 2 ( + 1) ( + 2) … ( + 𝑟 − 1)
2 2 2 2
= 𝑛(𝑛 + 2)(𝑛 + 4) … (𝑛 + 2𝑟 − 2)
Hence,
Limiting Form of 𝝌𝟐 Distribution for Large Degrees of Freedom.
1
If 𝑋 ∼ 𝜒 2 (𝑛), then 𝑀𝑋 (𝑡) = (1 − 2𝑡)−𝑛/2 , |𝑡| < 2.
The 𝑚. 𝑔. 𝑓. of standard 𝜒 2 variate 𝑍 is : 𝑀𝑋−𝜇/𝜎 (𝑡) = 𝑒 −𝜇/𝜎 𝑀𝑋 (𝑡/𝜎)
−𝑛/2
2𝑡
⇒ 𝑀𝑧 (𝑡) = 𝑒 −−1𝑡/𝜎 (1 − 2𝑡/𝜎)−𝑛/2 = 𝑒 −𝑛𝑡/√2𝑛 (1 − ) (∵ 𝜇 = 𝑛, 𝜎 2 = 2𝑛)
√2𝑛
𝑛 𝑛 2
∴ 𝐾𝑧 (𝑡) = log 𝑀2 (𝑡) = −𝑡√ − log (1 − 𝑡√ )
2 2 𝑛
𝑛 𝑛 2 𝑡 2 2 𝑡 3 2 3/2
= −𝑡√ + [𝑡 ⋅ √ + ⋅ + ( ) +⋯]
2 2 𝑛 2 𝑛 3 𝑛
𝑛 𝑛 𝑡2 𝑡2
= −𝑡√ + 𝑡. √ + + 𝑂(𝑛−1/2 ) = + 𝑂(𝑛−1/2 ),
2 2 2 2
where 𝑂(𝑛−1/2 ) are terms containing 𝑛1/2 and higher powers of 𝑛 in the denominator.
𝑡2 2
∴ lim 𝐾𝑍 (𝑡) = ⇒ 𝑀𝑍 (𝑡) = 𝑒 𝑡 /2 as 𝑛 → ∞,
𝑛→∞ 2
which is the m.g.f. of a standard normal variate. Hence, by uniqueness theorem of 𝑚. 𝑔. 𝑓. 𝑍 is
asymptotically normal. In other words, standard 𝜒 2 variate tends to standard normal variate as
𝑛 → ∞. Thus, 𝜒 2 . distribution tends to normal distribution for large 𝑑. 𝑓.
202 | P a g e
203 | P a g e
Question: 4
1 6 1
For the discrete variate with density: 𝑓(𝑥) = 8 𝐼(−1) (𝑥) + 8 𝐼(0) (𝑥) + 8 𝐼(1) (𝑥).
Which of the following is TRUE?
1
A. 𝐸(𝑋) = 2
1
B. 𝑉(𝑋) = 2
1
C. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4
1
D. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≥ 4
Question: 5
Lę 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal distribution. What is the
distribution W is given by
√2(𝑋1 + 𝑋2 )
𝑊=
√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2
204 | P a g e
Question: 7
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from 𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }.
1 1
Define 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = 4 (3𝑋(1) + 𝑋(𝑛) + 1)
1
and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2) an estimator for 𝜃, then which of the following is/are TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
C. P(X = Y) = 0
D. All of the above
Question: 9
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be sequence of independently and identically distributed random variables
with the probability density function
1 2 −𝑥
𝑓(𝑥) = {2 𝑥 𝑒 , if 𝑥 > 0 and let
0, otherwise
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 then which of the following statement is/are TRUE?
205 | P a g e
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question: 10
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
Answer 2: B
Explanation:
206 | P a g e
If 𝐴 and 𝐵 be independent 𝑅𝑎𝑛𝑑𝑜𝑚 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒 each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵},
then
𝐸(𝑈) = 1/3, 𝐸(𝑉) = 2/3 and 𝑈𝑉 = 𝐴𝐵 and 𝑈 + 𝑉 = 𝐴 + 𝐵
Thus Cov (𝑈, 𝑉) = 𝐸(𝑈𝑉) − 𝐸(𝑈)
𝐸(𝑉) = 𝐸(𝐴𝐵) − 𝐸(𝑈)
1 2 1
E(V) = E(A) ⋅ E(B) − E(U) ⋅ E(V) = − =
4 9 36
Answer 3: B
Explanation:
1
𝑋𝑖 ∼ 𝑈(0, 𝜃 2 ) 𝑓(𝑥) = ; 0 < 𝑥𝑖 < 𝜃 2
𝜃2
𝑋(3) ≤ 𝜃 2 ⇒ 𝜃ˆ ∈ [√𝑋(3) , ∞)
3
1
𝐿(𝑋, 𝜃) = ∏ 𝑓(𝑥𝑖 , 𝜃) =
𝜃6
𝑖=1
∂𝐿
⇒ ∂𝜃 < 0 there fore given function is decreasing then 𝜃ˆ = √𝑋(3)
Answer 4: C
Explanation:
X −1 0 1
207 | P a g e
1
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4 [By Chebychev’s inequality]
Answer 5: B
Explanation:
Let 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal
√2(X1 +X2 )
distribution. Then W = ∼ 𝑡(2)
√(X2 −X1 )2 +(Y2 −Y1 )2
Hence option (b) is correct.
Answer 6: D
Explanation:
Let 𝑋 be Random Variable with 𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑etx P(X = x)
1
; 𝑥=0
6
1
; 𝑥=1
3
Then 𝑃(𝑋 = 𝑥) = 1
; 𝑥=2
3
1
{6 ; 𝑥 = 3
𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)
1 1 1 5
= + + =
6 3 3 6
Answer 7: A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
distribution of 𝑋 free from parameter,
1 1
then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1 3
Take 𝜆 = 2 , 4 and 4 then we obtained mle of 𝜃 are
208 | P a g e
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
𝑆
For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
For option (c)
209 | P a g e
𝑆𝑛
→3
𝑛
with probability 1 (By using convergent in probability condition)
7.8 REFERENCES
• Devore, J. (2012). Probability and statistics for engineers, 8th ed. Cengage Learning.
• John A. Rice (2007). Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Larsen, R., Marx, M. (2011). An introduction to mathematical statistics and its
applications. Prentice Hall.
7.9 SUGGESTED READINGS
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
210 | P a g e
LESSON 8
STATISTICAL HYPOTHESIS
STRUCTURE
8.1 Learning Objectives
8.2 Introduction
8.3 Statistical Hypothesis
8.3.1 Simple and Composite Hypothesis
8.3.2 Critical Region
8.3.3 Type I and Type II Error
8.3.4 Most Powerful Test
8.3.5 Neymann Pearson Lemma
8.4 In-Text Questions
8.5 Summary
8.6 Glossary
8.7 Answer to In-Text Questions
8.8 References
8.9 Suggested Readings
8.1 LEARNING OBJECTIVES
One of the main objectives to discuss testing of hypothesis and how it can be use in real
analysis.
8.2 INTRODUCTION
The main problems in statistical inference can be broadly classified into two areas:
(i) The area of estimation of population parameter(s) and setting up of confidence intervals
for them, i.e, the area of point and intertal estimation and
(ii) Tests of statistical hypothesis.
In Neyman-Pearson theory, we use statistical methods to arrive at decisions in certain
situations where there is lack of certainty on the basis of a sample whose size is fixed in
advance while in Wald's sequential theory the sample size is not fixed but is regarded as a
random variable.
211 | P a g e
that the bulbs manufactured under some standard manufacturing process have an average life
of 𝜇 hours and it is proposed to test a new procedure for manufacturing light bulbs. Thus, we
have two populations of bulbs, those manufactured by standard process and those
manufactured by the new process. In this problem the following three hypotheses may be set
up:
(i) New process is better than standard process.
(ii) New process is inferior to standard process.
(iii) There is no difference between the two processes.
The first two statements appear to be biased since they reflect a preferential attitude to one or
the other of the two processes. Hence the best course is to adopt the hypothesis of no difference,
as stated in (iii). This suggests that the statistician should take up the neutral or null attitude
regarding the outcome of the test. His attitude should be on the null or zero line in which the
experimental data has the due importance and complete say in the matter. This neutral or non-
committal attitude of the statistician or decision-maker before the sample observations are
taken is the keynote of the null hypothesis.
Thus in the above example of light bulbs if 𝜇0 is the mean life (in hours) of the bulbs
manufactured by the new process then the null hypothesis which is usually denoted by H0 , can
be stated as follows: 𝐻0 : 𝜇 = 𝜇0 .
As another example let us suppose that two different concerns manufacture drugs for inducing
sleep, drug 𝐴 manufactured by first concern and drug 𝐵 manufactured by second roncern. Each
company claims that its drug is superior to that of the other and it is desired to test which is a
superior drug 𝐴 or 𝐵 ? To formulate the statistical hypothesis let 𝑋 be a random variable which
denotes the additional hours of sleep gained by an individual when drug 𝐴 is given and let the
random variable 𝑌 denote
the additional hours of sleep gained when drug 𝐵 is used, Let us suppose that 𝑋 and 𝑌 follow
the probability distributions with means 𝜇𝑋 and 𝜇𝑌 respectively. Here our null hypothesis
would be that there is no difference between the effects of two drugs. Symbolically, 𝐻0 : 𝜇𝑋 =
𝜇𝑌
Alternative Hypothesis.
It is desirable to state what is called an alternative hypothesis in respect of every statistical
hypothesis being tested because the acceptance or rejection of null hypothesis is meaningful
only when it is being tested against a rival hypothesis which should rather be explicitly
mentioned. Alternative hypothesis is usually denoted by 𝐻1 . For example, in the example of
light bulbs, alternative hypothesis could be 𝐻1 : 𝜇 > 𝜇0 or 𝜇 < 𝜇0 or 𝜇 ≠ 𝜇0 . In the example of
drugs, the alternative hypothesis could be 𝐻1 : 𝜇𝑋 > 𝜇𝑌 or 𝜇𝑋 < 𝜇𝑌 or 𝜇𝑋 ≠ 𝜇𝑌 .
In both the cases, the first two of the alternative hypotheses give rise to what are called 'one
tailed' test and the third alternative hypothesis results in 'two tailed' tests.
213 | P a g e
Important Remarks
1. In the formulation of a testing problem and devising a 'test of hypothesis' the roles of 𝐻0
and 𝐻1 are not at all symmetric. In order to decide which one of the two hypotheses should
be taken as null hypothesis 𝐻0 and which one as alternative hypothesis 𝐻1 , the intrinsic
difference between the roles and the implifications of these two terms should be clearly
understood.
2. If a particular problem cannot be stated as a test between two simple hypotheses, i.e.,
simple null hypothesis against a simple alternative hypothesis, then the next best alternative
is to formulate the problem as the test of a simple null hypothesis against a composite
alternative hypothesis. In other words, one should try to structure the problem so that null
hypothesis is simple rather than composite.
3. Keeping in mind the potential losses due to wrong decisions (which may or may not be
measured in terms of money), the decision maker is somewhat conservative in holding the
null hypothesis as true unless there is a strong evidence from the experimental sample
observations that it is false. To him, the consequences of wrongly rejecting a null
hypothesis seem to be more severe than those of wrongly accepting it. In most of the cases,
the statistical hypothesis is in the form of a claim that a particular product or product
process is superior to some existing standard. The null hypothesis H0 in this case is that
there is no difference between the new product or production process and the existing
standard. In other words, null hypothesis nullifies this claim. The rejection of the null
hypothesis wrongly which amounts to the acceptance of claim wrongly involves huge
amount of pocket expenses towards a substantive overhaul of the existing set-up. The
resulting loss is comparatively regarded as more serious than the opportunity loss in
wrongly accepting H0 which amounts to wrongly rejecting the claim, i.e., in sticking to the
less efficient existing standard. In the light-bulbs problem discussed earlier, suppose the
research division of the concern, on the basis of the limited experimentation, claims that
its brand is more effective than that manufactured by standard process. If in fact, the brand
fails to be more effective the loss incurred by the concern due to an immediate obsolescence
of the product, decline of the concern's image, etc., will be quite serious. On the other hand,
the failure to bring out a superior brand in the market is an opportunity loss and is not a
consideration to be as serious as the other loss.
8.3.2 Critical Region
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be the sample observations denoted by O. All the values of 𝑂 will be
aggregate of a sample and they constitute a space, called the sample space, which is denoted
by 𝑆.
Since the sample values 𝑥1 , 𝑥2 , … , 𝑥𝑛 can be taken as a point in 𝑛-dimensional space, we
specify some region of the 𝑛-dimensional space and see whether this point lies within this
region or outside this region. We divide the whole sample space 𝑆 into two disjoint parts 𝑊
214 | P a g e
and 𝑆 − 𝑊 or 𝑊 ‾ or 𝑊 ′ : The null hypothesis H0 is rejected if the observed sample point falls
in 𝑊 and if it falls in 𝑊 ′ we reject 𝐻1 and accept H0 . The region of rejection of H0 when H0
is true is that region of the outcome set where H0 is rejected if the sample point falls in that
region and is called critical region. Evidently, the size of the critical region is 𝛼, the probability
of committing type 1 error (discussed below).
Suppose if the test is based on a sample of size 2, then the outcome set or the sample space is
the first quadrant in a two dimensional space and a test criterion will enable us to separate our
outcome set into two complementary subsets, W and 𝑊 ‾ . If the sample point falls in the subset
𝑊, 𝐻0 is rejected, otherwise 𝐻0 is accepted. This is shown in the adjoining diagram :
215 | P a g e
From the above table it is obvious that in any testing problem we are liable to commit two
types of errors.
Errors of Type I and Type II. The error of rejecting 𝐻0 (accepting 𝐻1 ) when 𝐻0 is true is called
Type I error and the error of accepting 𝐻0 when 𝐻0 is false (𝐻1 is true) is called Type II error.
The probabilities of type I and type II errors are denoted by 𝛼 and 𝛽 respectively.
Thus
𝛼 = Probability of type I error = Probability of rejecting 𝐻0 when 𝐻0 is true.
An ideal test would be the one which properly keeps under control both the types of errors.
But since the commission of an error of either type is a random variable, equivalently an ideal
test should minimise the probability of both the types of errors, viz., 𝛼 and 𝛽. But
unfortunately, for a fixed sample size 𝑛, 𝛼 and 𝛽 are so related (like producer's and consumer's
risk in sampling inspection plans), that the reduction in one results in an increase in the other.
Consequently, the simultaneous minimising of both the errors is not possible. Since type I error
is deemed to be more serious than the type II error (c.f. Remark 3§18.2.3 ) the usual practice
is to control 𝛼 at a predetermined low level and subject to this constraint on the probabilities
of type I error, choose a test which minimises 𝛽 or maximises the power function 1 − 𝛽.
Generally, we choose 𝛼 = 0.05 or 0.01 .
STEPS IN SOLVING TESTING OF HYPOTHESIS PROBLEM
The major steps involved in the solution of a 'testing of hypothesis' problem may be outlined
as follows:
1) Explicit knowledge of the nature of the population distribution and the parameter(s) of
interest, i.e., the parameter(s) about which the hypotheses are set up.
2) Setting up of the null hypothesis 𝐻0 and the alternative hypothesis 𝐻1 in terms of the range
of the parameter values each one embodies.
3) The choice of a suitable statistic 𝑡 = 𝑡(𝑥1 , 𝑥2 , … . , 𝑥𝑛 ) called the test statistic, which will
best reflect upon the probability of 𝐻0 and 𝐻1 .
4) Partitioning the set of possible values of the test statistic 𝑡 into two disjoint sets 𝑊 (called
the rejection region or critical region) and 𝑊 ‾ (called the acceptance region) and framing
the following test :
(i) Reject 𝐻0 (i.e., accept 𝐻1 ) if the value of 𝑡 falls in 𝑊.
(ii) Accept 𝐻0 if the value of 𝑡 falls 𝑊 ‾.
5) After framing the above test, obtain experimental sample observations, compute the
appropriate test statistic and take action accordingly.
216 | P a g e
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
and 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) ≥ 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 ) for all 𝜃 ≠ 𝜃0
whatever the yegion 𝑊1 satisfying may be.
217 | P a g e
where 𝐿0 and 𝐿1 are the likelihood functions of the sample observations 𝑥 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
under 𝐻0 and 𝐻1 respectively. Then 𝑊 is the most powerful critical region of the test
hypothesis 𝐻0 : 𝜃 = 𝜃0 against the alternative 𝐻1 : 𝜃 = 𝜃1 .
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
𝑃(x ∈ 𝑊 ∣ 𝐻1 ) = ∫ 𝐿1 𝑑x = 1 − 𝛽, (say).
𝑊
In order to establish the lemma, we have to prove that there exists no other critical region, of
size less than or equal to 𝛼, which is more powerful than 𝑊.
Let 𝑊1 be another critical region of size 𝛼1 ≤ 𝛼 and power 1 − 𝛽1 so that we have
218 | P a g e
𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼1
𝑊1
𝐿1
and 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 ) = ∫ 𝑑x = 1 − 𝛽1
𝑊1
If 𝛼1 ≤ 𝛼, we have
∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝑊1 𝑊
⇒ ∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝐵∪𝐶 𝐴∪𝐶
⇒ ∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝐵 𝐴
⇒ ∫ 𝐿0 𝑑x ≥ ∫ 𝐿0 𝑑x
𝐴 𝐵
Since 𝐴 ⊂ 𝑊,
(18.5) ⇒ ∫ 𝐿1 𝑑x > 𝑘 ∫ 𝐿0 𝑑x ≥ 𝑘 ∫ 𝐿0 𝑑x
𝐴 𝐴 𝐵
Also it implies
𝐿1
‾
≤ 𝑘∀𝑥 ∈ 𝑊
𝐿0
⇒ ∫ 𝐿1 𝑑x ≤ 𝑘 ∫ 𝐿0 𝑑𝐱
‾
𝑊 ‾
𝑊
219 | P a g e
‾ , say 𝑊
This result also holds for any subset of 𝑊 ‾ ∩ 𝑊1 = 𝐵. Hence
∫ 𝐿1 𝑑x ≤ 𝑘 ∫ 𝐿0 𝑑x ≤ ∫ 𝐿1 𝑑x
𝐵 𝐵 𝐴
Adding ∫𝐶 𝐿1 𝑑𝐱 to both sides, we get
∫ 𝐿1 𝑑𝐱 ≤ ∫ 𝐿1 𝑑𝐱 ⇒ 1 − 𝛽 ≥ 1 − 𝛽1
𝑊1 𝑊
Similarly,
𝛽 ‾ ∣ 𝐻1 } = 𝑃{𝑥 ≤ 0.5 ∣ 𝜃 = 2}
= 𝑃{𝑥 ∈ 𝑊
0.5 0.5
1
= ∫ [𝑓(𝑥, 𝜃)]𝜃=2 𝑑𝑥 = ∫ 𝑑𝑥 = 0.25
0 0 2
Thus the sizes of type 𝐼 and type 𝐼 errors are respectively 𝛼 = 0.5 and 𝛽 = 0.25
and power function of the test = 1 − 𝛽 = 0.75
220 | P a g e
221 | P a g e
The size ' 𝛼 ' of the critical region is : 𝛼 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = 𝑃(𝑈 ≥ 0 ∣ 𝐻0 ) − − − −(∗∗)
𝑈−𝐸(𝑈) 𝑈+55
Under 𝐻0 : 𝜇 = −1, 𝑈 ∼ 𝑁(−55,1540)[ From (∗)] ⇒ 𝑍 = 𝜎 =
𝑈 √1540
55 55
∴ Under 𝐻0 , when 𝑈 = 0, 𝑍 = = 39.2428 = 1.4015
√1540
[From ( ∗∗ ) ]
(From Normal Probability Tables)
Alternatively, 𝛼 = 1 − 𝑃(𝑍 ≤ 1.4015) = 1 − Φ(1.4015),
where Φ(⋅) is the distribution function of standard normal variate.
Power of the test is : 1 − 𝛽 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) = 𝑃(𝑈 ≥ 0 ∣ 𝐻1 )
Under 𝐻1 : 𝜇 = 1, 𝑈 ∼ 𝑁(55,1540)
222 | P a g e
𝑈 − 𝐸(𝑈) −55
⇒ 𝑍 = = = −1.40 (when U = 0)
𝜎𝑢 √1540
∴ 1 − 𝛽 = 𝑃(𝑍 ≥ −1.40) = 𝑃(−1.4 ≤ 𝑍 ≤ 0) + 0.5
= 𝑃(0 ≤ 𝑍 ≤ 1.4) + 0.5 (By Symmetry)
= 0.4192 + 0.5 = 0.9192
Alternatively, 1 − 𝛽 = 1 − 𝑃(𝑍 ≤ −1.40) = 1 − Φ(−1.40),
where Φ(.)𝑖𝑠𝑡ℎ𝑒𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑜𝑓𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑛𝑜𝑟𝑚𝑎𝑙𝑣𝑎𝑟𝑖𝑎𝑡𝑒.
Example 6. Use the Neyman-Pearson Lemma to obtain the region for testing 𝜃 = 𝜃0 against
𝜃 = 𝜃1 > 𝜃0 and 𝜃 = 𝜃1 < 𝜃0 , in the case of a normal population 𝑁(𝜃, 𝜎 2 ), where 𝜎 2 is
known. Hence find the power of the test.
Solution.
𝑛 𝑛 𝑛
1 1
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = ( ) exp {− 2 ∑ (𝑥𝑖 − 𝜃)2 }
𝜎√2𝜋 2𝜎
𝑖=1 𝑖=1
Using Neyman-Pearson Lemma, best critical region (B.C.R.) is given by (for 𝑘 > 0 )
1 𝑛 2
𝐿1 exp {− 2𝜎 2 ∑𝑖=1 (𝑥𝑖 − 𝜃1 ) }
= ≥𝑘
𝐿0 exp {− 1 ∑𝑛 (𝑥 − 𝜃 )2 }
2𝜎 2 𝑖=1 𝑖 0
𝑛 𝑛
1
⇒ exp [− 2 {∑ (𝑥𝑖 − 𝜃1 )2 − ∑ (𝑥𝑖 − 𝜃0 )2 }] ≥ 𝑘
2𝜎
𝑖=1 𝑖=1
𝑛
𝑛 1
⇒ exp [− 2 (𝜃12 − 𝜃02 ) + 2 (𝜃1 − 𝜃0 ) ∑ 𝑥𝑖 ] ≥ 𝑘
2𝜎 𝜎
𝑖=1
𝑛
𝑛 2 2)
1
⇒ − (𝜃 − 𝜃 + (𝜃 − 𝜃0 ) ∑ 𝑥𝑖 ≥ log 𝑘
2𝜎 2 1 0
𝜎2 1
𝑖=1
𝜎2
𝐻𝑖 is true is 𝑁 (𝜃𝑖 , ) , (𝑖 = 0,1). Therefore, the constants 𝜆1 and 𝜆2 are determined from the
𝑛
relations :
𝑃[𝑥‾ > 𝜆1 ∣ 𝐻0 ] = 𝛼 and 𝑃[𝑥‾ < 𝜆2 ∣ 𝐻0 ] = 𝛼
𝜆1 − 𝜃0
∴ 𝑃(𝑥‾ > 𝜆1 ∣ 𝐻0 ) = 𝑃 [𝑍 > ] = 𝛼; 𝑍 ∼ 𝑁(0,1)
𝜎/√𝑛
𝜆1 − 𝜃0 𝜎
⇒ = 𝑧𝛼 ⇒ 𝜆1 = 𝜃0 + 𝑧𝛼
𝜎/√𝑛 √𝑛
Power of the test. By definition, the power of the test in case (𝑖) is :
1 − 𝛽 = 𝑃[𝑥 ∈ 𝑊 ∣ 𝐻1 ] = 𝑃[𝑥‾ ≥ 𝜆1 ∣ 𝐻1 ]
𝜆1 − 𝜃1 𝑥‾ − 𝜃1
= 𝑃 (𝑍 ≥ ) [∵ Under 𝐻1 , 𝑍 = ∼ 𝑁(0,1)]
𝜎/√𝑛 𝜎/√𝑛
𝜎
𝜃0 + 𝑧𝛼 − 𝜃1
√ 𝑛
= 𝑃 (𝑍 ≥ )
𝜎/√𝑛
𝜃1 − 𝜃0
= 𝑃 (𝑍 ≥ 𝑧𝛼 − )
𝜎/√𝑛
= 1 − 𝑃(𝑍 ≤ 𝜆3 )
= 1 − Φ(𝜆3 ),
where Φ(.)𝑖𝑠𝑡ℎ𝑒𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑜𝑓𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑛𝑜𝑟𝑚𝑎𝑙𝑣𝑎𝑟𝑖𝑎𝑡𝑒.
Similarly in case (ii), (𝜃1 < 𝜃0 ), the power of the test is
225 | P a g e
𝜆2 − 𝜃1
1−𝛽 = 𝑃(𝑥‾ < 𝜆2 ∣ 𝐻1 ) = 𝑃 (𝑍 < )
𝜎/√𝑛
𝜎
𝜃0 + 𝑧1−𝛼 − 𝜃1
√𝑛
= 𝑃 (𝑍 < )
𝜎/√𝑛
𝜃0 − 𝜃1
= 𝑃 (𝑍 < 𝑧1−𝛼 + ) = Φ(𝜆4 ), (∵ 𝜃0 > 𝜃1 ) … (18 ⋅ 13𝑎)
𝜎/√𝑛
√𝑛(𝜃0 −𝜃1 ) √𝑛(𝜃0 −𝜃1 )
where 𝜆4 = 𝑧1−𝛼 + = − 𝑧𝛼
𝜎 𝜎
2∑𝑛 ln 𝑋 2∑𝑛
𝑖=1 ln 𝑋𝑖
A. [ 𝜒2𝑖=1(2𝑛)𝑖 , 2 (2𝑛) ]
𝛼 𝜒𝛼
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
B. [ , ]
𝜒2 𝛼 (2𝑛) 2 (2𝑛)
𝜒𝑎
1−
2 2
2∑𝑛 𝑋𝑖 2∑𝑛 𝑋
C. [ 𝜒2𝑖=1 , 𝑖=1 𝑖 ]
(2𝑛) 𝜒2 (2𝑛)
𝛼 𝛼
1−
2 2
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛
𝑖=1 ln 𝑋𝑖
D. [ 2 (2𝑛) , ]
𝜒𝛼 𝜒2 𝛼 (2𝑛)
1−
2 2
Question 5.
Suppose that 𝑋 has uniform distribution on the interval [0,100]. Let 𝑌 denote the greatest
integer smaller than or equal to X. Which of the following is true?
1
A. 𝑃(𝑌 ≤ 25) =
4
26
B. 𝑃(𝑌 ≤ 25) = 100
C. 𝐸(𝑌) = 50
227 | P a g e
101
D. 𝐸(𝑌) = 2
Question 6.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
probaability density function
𝑥 𝑥
1 1 1 − 2
𝑓( 𝑥 ∣ 𝜃 ) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 {1,2,3,4}
then the method of moment estimate (MME) of 𝜃 is
A. 1
B. 2
C. 3
D. 4
Question 7.
Let the random variable 𝑋 and 𝑌 have the joint probability mass function
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑒 −2 (𝑦) ( ) ( ) , 𝑦 = 0,1,2, … , 𝑥; 𝑥 = 0,1,2, …
4 4 𝑥!
Then 𝑉(𝑌) is equal to
A. 1
B. 1/2
C. 2
D. 3/2
Question 8.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
Which of the following statements is(are) TRUE?
A. The marginal distribution of 𝑋 is Poisson with mean 1/2
B. The random variable 𝑋 and 𝑌 are independent
1
C. The conditional distribution of X given Y = 5 is Bin (6, 2)
D. 𝑃(𝑌 = 𝑛) = (𝑛 + 1)𝑃(𝑌 = 𝑛 + 2) for 𝑛 = 0,1,2, …
228 | P a g e
Question 9.
Consider the trinomial distribution with the probability mass function
2! 1 𝑥 2 𝑦 3 2−𝑥−𝑦
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ( ) ( ) ( )
𝑥! 𝑦! (2 − 𝑥 − 𝑦)! 6 6 6
, 𝑥 ≥ 0, 𝑦 ≥ 0, and 0 < 𝑥 + 𝑦 ≤ 2. Then Corr (𝑋, 𝑌) is equal to…
(correct up to two decimal places)
A) -0.31
B) 0.31
C) 0.35
D) 0.78
Question 10.
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2 be the observed values of a random sample of size
four from a distribution with the probability density function
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
Then the maximum likelihood estimate of 𝜃 2 + 𝜃 + 1 is equal (up to decimal place).
A) 1.75
B) 1.89
C) 1.74
D) 0.87
Question 11.
Let 𝑈 ∼ 𝐹5,8 and 𝑉 ∼ 𝐹8,5. If 𝑃[𝑈 > 3.69] = 0.05, then the value of C such that
𝑃[𝑉 > 𝑐] = 0.95 equals… (round off two decimal places)
A) 0.27
B) 1.27
C) 2.27
D) 2.29
229 | P a g e
Question 12.
Let P be a probability function that assigns the same weight to each of the points of the sample
space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then which of
the following statement(s) is (are) TRUE?
1. E and F are independent
2. E and G are independent
3. E, F and G are independent
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. 1 and 2 only
D. 1,2 and 3
Question 13.
Let 𝑋1 , 𝑋2 , … , 𝑋4 and 𝑌1 , 𝑌2 , … , 𝑌5 be two random samples of size 4 and 5 respectively,
51 𝑋 2 +𝑋 2 +𝑋 2 +𝑋 2
2 3 4
from a standard normal population. Define the statistic T = (4) 𝑌 2+𝑌 2 +𝑌 2 +𝑌 2 +𝑌 2
1 2 3 4 5
230 | P a g e
Question 15
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
𝑥 𝑥
1 1 1 − 2
probability density function 𝑓(𝑥 ∣ 𝜃) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 ∈ (0, ∞)
Then the method of moment estimate (MME) of 𝜃 is
A. 1.5
B. 2.5
C. 3.5
D. 4.5
Question 16.
Let 𝑋 be a random variable with cumulative distribution function
1 𝑛+2𝑘+1
𝑃(𝑋 = ℎ, 𝑌 = 𝑘) = ( ) ; 𝑛 = −𝑘, −𝑘 + 1, … , ; 𝑘 = 1,2, …
2
Then E(Y) equals
A. 1
B. 2
C. 3
D. 4
Question 17.
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
Which of the following statements is (are) TRUE?
3
A. 𝑃(1 < 𝑋 < 2) =
10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
231 | P a g e
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
Question 18.
Let the random variables 𝑋1 and 𝑋2 have joint probability density function
𝑥1 𝑒 −𝑥1𝑥2
𝑓(𝑥1 , 𝑥2 ) = { , 1 < 𝑥1 < 3, 𝑥2 > 0
2
0, otherwise.
What is the value Var (𝑋2 ∣ 𝑋1 = 2) …(up to two decimal place)?
A) 0.27
B) 0.28
C) 0.25
D) 1.90
Question 19
Let 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0, 𝑥4 = 1, 𝑥5 = 0, 𝑥6 = 1 be the data on a random sample of size 6
from Bin (1, 𝜃) distribution, where 𝜃 ∈ (0,1). Then the uniformly minimum variance unbiased
estimate of 𝜃(1 + 𝜃) equal to
8.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
8.6 GLOSSARY
• Motivation: These Problems are very useful in real life and we can use it in data
science , economics as well as social sciemce.
• Attention: Think how the best estimator are useful in real world problems.
8.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1: A
Explanation :
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwisse
𝑛𝜃
E(𝑋(𝑛) ) =
𝑛+1
1 𝜃
Let Y = 𝜃 (𝑋(𝑛) + 𝑛+1)
232 | P a g e
𝑋(𝑛) 1
so E(Y) = E( +𝑛+1) = 1
𝜃
1 1
lim𝑛→∞ 𝐸 [ (𝑋(𝑛) + )] = 1;
𝜃 𝑛+1
Hence option A is correct.
Answer 2: B
Explanation:
Convexity (peakedness) is decided by kurtosis.
Answer 3: C
Explanation :
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛𝑥‾𝑦‾ 400 − 10 × 5 × 4 4 4
𝛽ˆ = 𝑛 = = ; 𝛼
ˆ = 𝑦
‾ − 𝑥
‾𝛽ˆ = 4−5× =0
∑𝑖=1 𝑥𝑖2 − 𝑛𝑥‾ 2 500 − 10 × 52 5 5
An unbiased estimate of 𝜎 2 is
1 2 1 4 2
𝜎ˆ 2 = 𝑛−2 ∑𝑛𝑖=1 (𝑦𝑖 − 𝛼ˆ − 𝛽ˆ 𝑥𝑖 ) = 10−2 ∑𝑛𝑖=1 (𝑦𝑖 − 5 𝑥𝑖 )
1 4 4 2 1 8 16
= 8 (∑𝑛𝑖=1 𝑦𝑖2 − 2 × 5 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 + (5) ∑𝑛𝑖=1 𝑥𝑖2 ) = 8 (400 − 5 × 400 + 25 × 500)
= 10
Hence option C is correct.
Answer 4: B
Explanation :
2
We use the random variable 𝑄 = 𝜃 ∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒(2𝑛)
2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
1 − 𝛼 = 𝑃 (𝜒α2 (2𝑛) ≤ 𝑄 ≤ 𝜒1−
2
α (2𝑛)) = 𝑃 [ 2 ≤𝜃≤ 2 (2𝑛) ]
2 2 𝜒 α (2𝑛) 𝜒α
1−
2 2
233 | P a g e
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
Thus, 100(1 − 𝛼)% confidence interval for 𝜃 is given by [𝜒2 ≤𝜃≤ 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝛼
1−
2 2
Hence option B is correct.
Answer 5: B
Explanation :
Let 𝑌 = [X]; where 𝑌 denote the greatest integer smaller than or equal to 𝑋.
26 1 26
𝑃(𝑌 ≤ 25) = 𝑃([𝑋] ≤ 25) = 𝑃(𝑋 ∈ (0,26)) = ∫0 𝑑𝑥 =
100 100
Hence B is the correct option.
Answer 6: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ {1,2,3,4} then 𝜃 = 3
Hence option C is correct.
Answer 7: D
Explanation :
The marginal pmf of 𝑌 is given by
∞ ∞
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑋 = 𝑥), 𝑌 = 𝑦) = ∑ 𝑒 −2 (𝑦) ( ) ( )
4 4 𝑥!
𝑥=𝑦 𝑥=𝑦
𝑦 ∞ 𝑢
3 𝑦+𝑢 1 2𝑦+𝑢
= 𝑒 −2 ( ) ∑ ( 𝑦 ) ( ) (Assume 𝑢 = 𝑥 − 𝑦)
4 4 (𝑦 + 𝑢)!
𝑢=0
𝑦 ∞ ∞
−2
3 (𝑦 + 𝑢)! 1 𝑢 2𝑦+𝑢 3 𝑦 2𝑦 1! 1 𝑢
=𝑒 ( ) ∑ ( ) == 𝑒 −2 ( ) ∑ ( )
4 𝑦! 𝑢! 4 (𝑦 + 𝑢)! 4 𝑦! 𝑢! 2
𝑢 0
3 3 𝑦
− ( )
−2
3 𝑦 2𝑦 1/2 𝑒 22
=𝑒 ( ) 𝑒 = , 𝑦 = 0,1, …
4 𝑦! 𝑦!
234 | P a g e
Which is the pmf of Poisson random variable with parameter 3/2, so 𝐸(𝑋) = 3/2 and
𝑉(𝑋) = 3/2.
Answer 8: A
Explanation :
The marginal probability mass function of X is given by
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
1 1 𝑚
− ( )
𝑒 22
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞ 𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
−1
𝑒
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 9: 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
235 | P a g e
2 2
= 2 × ̅6 (1 − 6) = 16/36
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
Answer 10: 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
236 | P a g e
𝑃(𝑌 = 𝑘) = ∑∞
𝑛=−𝑘 𝑃(𝑋 = 𝑛, 𝑌 = 𝑘): { put m = n + k}
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
1 1 𝑘−1
𝐸(𝑌) = ∑∞ ( )
𝑘=0 𝑘 =2
2 2
Hence option B is correct.
Answer 17: A
Explanation :
3
𝑃(1 < 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) =
10
3
𝑃(1 < 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) =
5
1
𝑃(1 ≤ 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) =
2
4
𝑃(1 ≤ 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) + 𝑃(𝑋 = 1) =
5
Answer 18: C
Explanation :
∞ 𝑥1 𝑒 −𝑥1 𝑥2 1
The marginal pdf of 𝑋1 is 𝑔(𝑥1 ) = ∫0 𝑑𝑥2 = 2 , 1 < 𝑥1 < 3
2
𝑓(𝑥1 , 𝑥2 )
ℎ(𝑥2 ∣ 𝑥1 ) = = 𝑥1 𝑒 −𝑥1𝑥2 , 𝑥2 > 0
𝑔(𝑥1 )
1
𝑋2 ∣ 𝑋1 ∼ Exp (𝑥1 ) with mean 𝑥
1
1
Therefore Var (𝑋2 ∣ 𝑋1 = 2) = 4 = 0.25
Hence 0.25 is correct answer.
Answer 19: 0.7
Explanation :
𝑇 𝑇(𝑇−1)
+ is UMVUE of 𝜃(1 + 𝜃)
𝑛 𝑛(𝑛−1)
𝑇 𝑇(𝑇−1)
𝑬 (𝑛 + 𝑛(𝑛−1)) = 𝜃(1 + 𝜃); where
𝑇 = ∑𝑛𝑖=1 𝑋𝑖
238 | P a g e
𝑇 𝑇(𝑇−1) 3 3(3−1) 21
Therefore, 𝑛 + 𝑛(𝑛−1) = 6 + 6(6−1) = 30 = 0.70
8.8 REFERENCES
• Devore, J. (2012). Probability and statistics for engineers, 8th ed. Cengage Learning.
• John A. Rice (2007). Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Larsen, R., Marx, M. (2011). An introduction to mathematical statistics and its
applications. Prentice Hall.
• Miller, I., Miller, M. (2017). J. Freund’s mathematical statistics with applications, 8th
ed. Pearson.
• Demetri Kantarelis, D. and Malcolm O. Asadoorian, M. O. (2009). Essentials of
Inferential Statistics, 5th edition, University Press of America.
• Hogg, R., Tanis, E., Zimmerman, D. (2021) Probability and Statistical inference,
10TH Edition, Pearson
8.9 SUGGESTED READINGS
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
239 | P a g e
LESSON 9
ERROR IN HYPOTHESIS TESTING AND
POWER OF TEST
STRUCTURE
9.1 Learning Objectives
9.2 Introduction
9.3 Error in Hypothesis Testing and Power of Test
9.3.1 Type I and Type II Error
9.3.2 Unbiased Test and Unbiased Critical Region
9.3.3 UMP (Uniformly Most Powerful) Critical Region
9.3.4 Likelihoof Ratio Test
9.4 In-Text Questions
9.5 Summary
9.6 Glossary
9.7 Answer to In-Text Questions
9.8 References
9.9 Suggested Readings
9.1 LEARNING OBJECTIVES
One of the main objectives to discuss testing of hypothesis and how it can be use in real
analysis.
9.2 INTRODUCTION
The main problems in statistical inference can be broadly classified into two areas:
(i) The area of estimation of population parameter(s) and setting up of confidence intervals
for them, i.e, the area of point and intertal estimation and
(ii) Tests of statistical hypothesis.
In Neyman-Pearson theory, we use statistical methods to arrive at decisions in certain
situations where there is lack of certainty on the basis of a sample whose size is fixed in
advance while in Wald's sequential theory the sample size is not fixed but is regarded as a
random variable.
240 | P a g e
242 | P a g e
⇒ 𝑊 is unbiased CR.
𝑊 = {x: 𝑔𝜃1 (𝑡(x)) ⋅ ℎ(x) ≥ 𝑘 ⋅ 𝑔𝜃0 (𝑡(x)) ⋅ ℎ(x)}, ∀𝑘 > 0
= {x: 𝑔𝜃1 (𝑡(x)) ≥ 𝑘 ⋅ 𝑔𝜃0 (𝑡(x))}, ∀𝑘 > 0
Hence if 𝑇 = 𝑡(𝑥) is sufficient statistic for 𝜃 then the MPCR for the test may be defined in
terms of the marginal distribution of 𝑇 = 𝑡(x), rather than the joint distribution of
𝑥1 , 𝑥2 , … , 𝑥𝑛 .
9.3.3 UMP (Uniformly Most Powerful ) Critical Region
It provides best critical region for testing 𝐻0 : 𝜃 = 𝜃0 against the hypothesis 𝜃 = 𝜃1 , provided
𝜃1 > 𝜃0 while it defines the best critical region for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 ,
provided 𝜃1 < 𝜃0 . Thus, the best critical region for testing simple hypothesis 𝐻0 : 𝜃 = 𝜃0
against the simple hypothesis 𝜃 = 𝜃1 + 𝑐, 𝑐 > 0 will not serve as best critical region for testing
simple hypothesis 𝐻0 : 𝜃 = 𝜃0 against simple alternative hypothesis 𝐻1 : 𝜃 = 𝜃0 − 𝑐, 𝑐 > 0.
Hence in this problem, no uniformly most powerful test exists for testing the simple hypothesis,
𝐻0 : 𝜃 = 𝜃0 against the composite alternative hypothesis, 𝐻1 : 𝜃 ≠ 𝜃0 .
However, for each alternative hypothesis, 𝐻1 : 𝜃 = 𝜃1 > 𝜃0 or 𝐻1 : 𝜃 = 𝜃 < 𝜃0 , 𝑎 UMP test
exists .
Remark. In particular, if we take 𝑛 = 2, then the B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 , against
𝐻1 : 𝜃 = 𝜃1 (> 𝜃0 ) is given by :
𝑊 = {x: (𝑥1 + 𝑥2 )/2 ≥ 𝜃0 + 𝜎𝑧𝑎 /√2}
= {x: 𝑥1 + 𝑥2 ≥ 2𝜃0 + √2𝜎𝑧𝛼 }
= {x: 𝑥1 + 𝑥2 ≥ 𝐶}, (say)
where 𝐶 = 2𝜃0 + √2𝜎𝑧𝛼 = 2𝜃0 + √2𝜎 × 1.645, if 𝛼 = 0.05.
Similarly, the B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 (< 𝜃0 ) with 𝑛 = 2 and 𝛼 =
0.05 is given by
𝑊1 = {x: (𝑥1 + 𝑥2 )/2 ≤ 𝜃0 − 𝜎𝑧𝑎 /√2}
= {x: (𝑥1 + 𝑥2 ) ≤ 2𝜃0 − √2𝜎 × 1.645}
= {x: 𝑥1 + 𝑥2 ≤ C1 }, (say),
where 𝐶1 = 2𝜃0 − √2𝜎𝑧𝛼 = 2𝜃0 − √2𝜎 × 1.645, if 𝛼 = 0.05
The B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 against the two tailed alternative 𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃0 ), is
given by : 𝑊2 = {x: (𝑥1 + 𝑥2 ≥ 𝐶) ∪ (𝑥1 + 𝑥2 ≤ 𝐶1 )}
The regions are given by the shaded portions in the following figures (i), (ii) and (iii)
respectively.
243 | P a g e
Example 1. Show that for the normal distribution with zero mean and variance 𝜎 2 , the best
critical region for 𝐻0 : 𝜎 = 𝜎0 against the alternative 𝐻1 : 𝜎 = 𝜎1 is of the form :
𝑛
244 | P a g e
245 | P a g e
𝑛 𝑛
𝛽1 𝑛
⇒ ( ) exp {−𝛽1 ∑ (𝑥𝑖 − 𝛾1 ) + 𝛽0 ∑ (𝑥𝑖 − 𝛾0 )} ≥ 𝑘
𝛽0
𝑖=1 𝑖=1
𝑛
𝛽1
⇒ ( ) exp [−𝛽1 𝑛(𝑥‾ − 𝛾1 ) + 𝛽0 𝑛(𝑥‾ − 𝛾0 )] ≥ 𝑘
𝛽0
⇒ 𝑛log (𝛽1 /𝛽0 ) − 𝑛𝑥‾(𝛽1 − 𝛽0 ) + 𝑛𝛽1 𝛾1 − 𝑛𝛽0 𝛾0 ≥ log 𝑘
(since log 𝑥 is an increasing function of 𝑥 ).
1 𝛽1
⇒ 𝑥‾(𝛽1 − 𝛽0 ) ≤ {𝛾1 𝛽1 − 𝛾0 𝛽0 − log 𝑘 + log ( )}
𝑛 𝛽0
1 1 𝛽1
∴ 𝑥‾ ≤ {𝛾1 𝛽1 − 𝛾0 𝛽0 − log 𝑘 + log ( )} provided 𝛽1 > 𝛽0 .
𝛽1 − 𝛽0 𝑛 𝛽0
Example 3. Examine whether a best critical region exists for testing the null hypothesis
𝐻0 : 𝜃 = 𝜃0 against the alternative hypothesis 𝐻1 : 𝜃 > 𝜃0 for the parameter 𝜃 of the
distribution:
1+𝜃
𝑓(𝑥, 𝜃) = ,1 ≤ 𝑥 < ∞
(𝑥 + 𝜃)2
1
Solution. ∏𝑛𝑖=1 𝑓(𝑥𝑖 , 𝜃) = (1 + 𝜃)𝑛 ∏𝑛𝑖=1 (𝑥 +𝜃)2
𝑖
𝑥 +𝜃
Thus the test criterion is ∑𝑛𝑖=1 log (𝑥𝑖 +𝜃0), which cannot be put in the form of a function of the
𝑖 1
sample observations, not depending on the hypothesis. Hence no B.C.R. exists in this case.
9.3.4 Likelihood Ratio Test:
Neyman-Pearson Lemma based on the magnitude of the ratio of two probability density
functions provides best test for testing simple hypothesis against simple alternative hypothesis.
The best test in any given situation depends on the nature of the population distribution and
246 | P a g e
the form of the alternative hypothesis being considered. In this section we shall discuss a
general method of test construction called the Likelihood Ratio (L.R.) Test introduced by
Neyman and Pearson for testing a hypothesis, simple or composite, against a simple or
composite alternative hypothesis. This test is related to the maximum likelihood estimates.
Before defining the test, we give below some notations and terminology.
Parameter Space. Let us consider a random variable 𝑋 with p.d.f. 𝑓(𝑥, 𝜃). In most common
applications, though not always, the functional form of the population distribution is assumed
to be known except for the value of some unknown parameter(s) 𝜃 which may take any value
on a set Θ. This is expressed by writing the p.d.f. in the form 𝑓(𝑥, 𝜃), 𝜃 ∈ Θ. The set Θ, which
is the set of all possible values of 𝜃 is called the parameter space. Such a situation gives rise
not to one probability distribution but a family of probability distributions which we write as I
𝑓(𝑥, 𝜃) = 𝜃 ∈ Θ}. For example if 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then the parameter space is :
Θ = {(𝜇, 𝜎 2 ): −∞ < 𝜇 < ∞, 0 < 𝜎 < ∞}
In particular, for 𝜎 2 = 1, the family of probability distributions is given by
{𝑁(𝜇, 1); 𝜇 ∈ Θ}, where Θ = {𝜇: −∞ < 𝜇 < ∞}
In the following discussion we shall consider a general family of distributions:
{𝑓(𝑥: 𝜃1 , 𝜃2 , … , 𝜃𝑘 ): 𝜃𝑖 ∈ Θ, 𝑖 = 1,2, … , 𝑘}
The null hypothesis 𝐻0 will state that the parameters belong to some subspace Θ0 of the
parameter space Θ.
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample of size 𝑛 > 1 from a population with p.d.f. 𝑓(𝑥,
𝜃1 , 𝜃2 , … , 𝜃𝑘 ), where Θ, the parameter space is the totality of all points that (𝜃1 , 𝜃2 , …, 𝜃𝑘 ) can
assume. We want to test the null hypothesis :
𝐻0 : (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) ∈ Θ0
against all alternative hypotheses of the type :
𝐻1 : (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) ∈ Θ − Θ0
The likelihood function of the sample observations is given by
𝑛
𝐿 = ∏ 𝑓(𝑥𝑖 ; 𝜃1 , 𝜃2 , … , 𝜃𝑘 )
𝑖=1
According to the principle of maximum likelihood, the likelihood equation for estimating any
parameter 𝜃𝑖 is given by
247 | P a g e
∂𝐿
= 0, (𝑖 = 1,2, … , 𝑘)
∂𝜃𝑖
Using (18.17), we can obtain the maximum likelihood estimates for the parameters
(𝜃1 , 𝜃2 , … , 𝜃𝑘 ) as they are allowed to vary over the parameter space Θ and the subspace Θ0 .
Substituting these estimates in (18.16), we obtain the maximum values of the likelihood
function for variation of the parameters in Θ and Θ0 respectively. Then the criterion for the
likelihood ratio test is defined as the quotient of these two maxima and is given by
𝐿(Θ̂0 ) Sup𝜃∈Θ0 𝐿(x, 𝜃)
𝜆 = 𝜆(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = =
𝐿(Θ̂) Sup𝜃∈Θ 𝐿(x, 𝜃)
where 𝐿(Θ̂0 ) and 𝐿Θ̂ are the maxima of the likelihood function with respect to the parameters
in the regions Θ0 and Θ respectively.
The quantity 𝜆 is a function of the sample observations only and does not involve re
parameters. Thus 𝜆 being a function of the random variables, is also a random variable.
Obvious 𝜆 > 0. Further Θ0 ⊂ Θ ⇒ 𝐿(Θ0 ) ≤ 𝐿(Θ) ⇒ 𝜆 ≤ 1
Hence, we get
The critical region for testing 𝐻0 (against 𝐻1 ) is an interval
0 < 𝜆 < 𝜆0
where 𝜆0 is some number (< 1) determined by the distribution of 𝜆 and the desired probability
of type 1 error, i.e., 𝜆0 is given by the equation :
𝑃(𝜆 < 𝜆0 ∣ 𝐻0 ) = 𝛼
For example, if 𝑔(.)𝑖𝑠 𝑡ℎ𝑒 𝑝. 𝑑. 𝑓. 𝑜𝑓𝜆 then 𝜆0 is determined from the equation :
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 = 𝛼
0
A test that has critical region defined is a likelihood ratio test for testing H0 .
Remark. define the critical region for testing the hypothesis 𝐻0 by the likelihood ratio test.
Suppose that the distribution of 𝜆 is not known but the distribution of some function of 𝜆 is
known, then this knowledge can be utilized as given in the following theorem.
Theorem . If 𝜆 is the likelihood ratio for testing a simple hypothesis 𝐻0 and if 𝑈 = 𝜙(𝜆) is a
monotonic increasing (decreasing) function of 𝜆 then the test based on 𝑈 is equivalent to the
likelihood ratio test. The critical region for the test based on 𝑈 is :
𝜙(0) < 𝑈 < 𝜙(𝜆0 ), ∣ [[𝜙(𝜆0 ) < 𝑈 < 𝜙(0)]
Proof. The critical region for the likelihood ratio test is given by 0 < 𝜆 < 𝜆0 , where 𝜆0 is
determined by
248 | P a g e
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 = 𝛼
0
where ℎ(𝑢 ∣ 𝐻0 ) is the p.d.f. of 𝑈 when 𝐻0 is true. Here the critical region 0 < 𝜆 < 𝜆0
transforms to 𝜙(0) < 𝑈 < 𝜙(𝜆0 ). However if 𝑈 = 𝜙(𝜆) is a monotonic decreasing function
of 𝜆, then the inequalities are reversed and we get the critical region as 𝜙(𝜆0 ) < 𝑈 < 𝜙(0).
2. If we are testing a simple null hypothesis 𝐻0 then there is a unique distribution determined
for 𝜆. But if 𝐻0 is composite, then the distribution of 𝜆 may or may not be unique. In such a
case the distribution of 𝜆 may possibly be different for different parameter points in Θ0 and
then 𝜆0 is to be chosen such that
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 ≤ 𝛼
0
249 | P a g e
Properties of Likelihood Ratio Test. Likelihood ratio (L.R.) test principle is an intuitive one.
If we are testing a simple hypothesis H0 against a simple alternative hypothesis 𝐻1 then the 𝐿𝑅
principle leads to the same test as given by the Neyman-Pearson lemma. This suggests that 𝐿𝑅
test has some desirable properties, specially large sample properties.
In 𝐿𝑅 test, the probability of type 𝐼 error is controlled by suitably choosing the cut off point
𝜆0 . LR test is generally UMP if an UMP test at all exists. We state below, the two asymptotic
properties of 𝐿𝑅 tests.
1 Under certain conditions, −2log 𝑒 𝜆 has an asymptotic chi-square distribution.
Under certain assumptions, 𝐿𝑅 test is consistent.
Now we shall illustrate how the likelihood ratio criterion can be used to obtain various
standard tests of significance in Statistics.
250 | P a g e
Question 3.
Consider the sample linear regression model 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛 Where 𝜖𝑖′ 𝑠
are i.i.d random variables with mean 0 and variance 𝜎 2 ∈ (0, ∞). Suppose that we have a data
set (𝑥1 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) with n = 10, ∑𝑛𝑖=1 𝑥𝑖 = 50, ∑𝑛𝑖=1 𝑦𝑖 = 40, ∑𝑛𝑖=1 𝑥𝑖2 = 500
∑𝑛𝑖=1 𝑦𝑖2 = 400 and ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 400. An unbiased estimate of 𝜎 2 is :
A. 5
B. 1/5
C. 10
D. 1/10
Question 4.
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from a population with density
1 −𝑥
𝑓(𝑥, 𝜃) = {𝜃 𝑒
𝜃 if 0<𝑥 < ∞
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is a 100(1 − 𝛼)% confidence interval for 𝜃?
2∑𝑛 ln 𝑋 2∑𝑛
𝑖=1 ln 𝑋𝑖
A. [ 𝜒2𝑖=1(2𝑛)𝑖 , 2 (2𝑛) ]
𝛼 𝜒𝛼
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
B. [𝜒2 , 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝑎
1−
2 2
2∑𝑛 𝑋𝑖 2∑𝑛 𝑋
C. [ 𝜒2𝑖=1 , 𝑖=1 𝑖 ]
(2𝑛) 𝜒2 (2𝑛)
𝛼 𝛼
1−
2 2
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛𝑖=1 ln 𝑋𝑖
D. [ 2 (2𝑛) , ]
𝜒𝛼 𝜒2 𝛼 (2𝑛)
1−
2 2
Question 5.
Suppose that 𝑋 has uniform distribution on the interval [0,100]. Let 𝑌 denote the greatest
integer smaller than or equal to X. Which of the following is true?
1
A. 𝑃(𝑌 ≤ 25) = 4
251 | P a g e
26
B. 𝑃(𝑌 ≤ 25) = 100
C. 𝐸(𝑌) = 50
101
D. 𝐸(𝑌) = 2
Question 6.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
𝑥 𝑥
1 1 1
−
probaability density function 𝑓( 𝑥 ∣ 𝜃 ) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃2 + 𝑒 −𝑥 ] , 𝑥 >
0, 𝜃 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 {1,2,3,4}. Then the method of moment estimate (MME) of 𝜃 is
A. 1
B. 2
C. 3
D. 4
Question 7.
Let the random variable 𝑋 and 𝑌 have the joint probability mass function
−2
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑒 (𝑦) ( ) ( ) , 𝑦 = 0,1,2, … , 𝑥; 𝑥 = 0,1,2, …
4 4 𝑥!
Then 𝑉(𝑌) is equal to
A. 1
B. 1/2
C. 2
D. 3/2
Question 8.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
Which of the following statements is(are) TRUE?
A. The marginal distribution of 𝑋 is Poisson with mean 1/2
B. The random variable 𝑋 and 𝑌 are independent
1
C. The conditional distribution of X given Y = 5 is Bin (6, 2)
252 | P a g e
253 | P a g e
Question 12.
Let P be a probability function that assigns the same weight to each of the points of the
sample space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then
which of the following statement(s) is (are) TRUE?
1. E and F are independent
2. E and G are independent
3. E, F and G are independent
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. 1 and 2 only
D. 1,2 and 3
Question 13.
Let 𝑋1 , 𝑋2 , … , 𝑋4 and 𝑌1 , 𝑌2 , … , 𝑌5 be two random samples of size 4 and 5 respectively, from a
51 𝑋 2 +𝑋 2 +𝑋 2 +𝑋 2
2 3 4
standard normal population. Define the statistic T = (4) 𝑌 2+𝑌 2 +𝑌 2 +𝑌 2 +𝑌 2 , then which of the
1 2 3 4 5
following is TRUE?
A. Expectation of 𝑇 is 0.6
B. Variance of T is 8.97
C. T has F-distribution with degree of freedom 5 and 4
D. T has F-distribution with degree of freedom 4 and 5
Question 14.
Let 𝑋, 𝑌 and 𝑍 be independent random variables with respective moment generating function
1 2
𝑀𝑋 (𝑡) = 1−𝑡 , 𝑡 < 1; 𝑀𝑌 (𝑡) = 𝑒 𝑡 /2 = 𝑀𝑍 (𝑡) 𝑡 ∈ ℝ. Let 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 then P(W > 2)
is equals to
A. 2𝑒 −1
B. 2𝑒 −2
C. 𝑒 −1
D. 𝑒 −2
254 | P a g e
Question 15.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
probability density function
1 1 𝑥 1 −𝑥
𝑓(𝑥 ∣ 𝜃) = [ 𝑒 − 𝜃 + 2 𝑒 𝜃2 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 ∈ (0, ∞)
3 𝜃 𝜃
Then the method of moment estimate (MME) of 𝜃 is
A. 1.5
B. 2.5
C. 3.5
D. 4.5
Question 16.
Let 𝑋 be a random variable with cumulative distribution function
1 𝑛+2𝑘+1
𝑃(𝑋 = ℎ, 𝑌 = 𝑘) = ( ) ; 𝑛 = −𝑘, −𝑘 + 1, … , ; 𝑘 = 1,2, …
2
Then E(Y) equals
A. 1
B. 2
C. 3
D. 4
Question 17.
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
Which of the following statements is (are) TRUE?
3
A. 𝑃(1 < 𝑋 < 2) = 10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
255 | P a g e
11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
Question 18.
Let the random variables 𝑋1 and 𝑋2 have joint probability density function
𝑥1 𝑒 −𝑥1𝑥2
𝑓(𝑥1 , 𝑥2 ) = { , 1 < 𝑥1 < 3, 𝑥2 > 0
2
0, otherwise.
What is the value Var (𝑋2 ∣ 𝑋1 = 2) …(up to two decimal place)?
A) 0.27
B) 0.28
C) 0.25
D) 1.90
Question 19.
Let 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0, 𝑥4 = 1, 𝑥5 = 0, 𝑥6 = 1 be the data on a random sample of size 6
from Bin (1, 𝜃) distribution, where 𝜃 ∈ (0,1). Then the uniformly minimum variance unbiased
estimate of 𝜃(1 + 𝜃) equal to
Question: 20
Let 𝑋1 , 𝑋2 , … , 𝑋𝑁 be identically distributed random variable with mean 2 and variance 1. Let
N be a random variable follows Poisson distribution with mean 2 and independent of 𝑋i′ S. Let
𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 , then Var (SN ) is equals:
A. 4
B. 10
C. 2
D. 1
Question: 21
Let 𝐴 and 𝐵 be independent Random Variables each having the uniform distribution on
[0,1]. Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵}, then Cov (𝑈, 𝑉) is equals
A. -1/36
B. 1/36
C. 1
256 | P a g e
D. 0
Question 22.
Let 𝑋1 , 𝑋2 , 𝑋3 be random sample from uniform (0, 𝜃 2 ), 𝜃 > 1, then maximum likelihood
estimation (mle) of 𝜃
2
A. 𝑋(1)
B. √X(3)
C. √X(1)
D. 𝛼𝑋(1) + (1 − 𝛼)𝑋(3) ; 0 < 𝛼 < 1
Question 23.
For the discrete variate with density:
1 6 1
𝑓(𝑥) = 𝐼(−1) (𝑥) + 𝐼(0) (𝑥) + 𝐼(1) (𝑥).
8 8 8
Which of the following is TRUE?
1
A. 𝐸(𝑋) = 2
1
B. 𝑉(𝑋) = 2
1
C. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4
1
D. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≥ 4
Question: 24
Lęt 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal distribution. What is the
distribution W is given by
√2(𝑋1 + 𝑋2 )
𝑊=
√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2
A. t-distribution with 1 d.f
B. t-distribution with 2 d.f
C. Chi-square distribution with 2 d.f
D. Does not determined
257 | P a g e
Question: 25
The moment generating function of a random variable X is given by
1 1 1 1
𝑀𝑋 (𝑡) = 6 + 3 𝑒 𝑡 + 3 𝑒 2𝑡 + 6 𝑒 3𝑡 , −∞ < 𝑡 < ∞, then P(X ≤ 2) equals
1
A. 3
1
B. 6
1
C. 2
5
D. 6
Question: 26
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from 𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }.
1 1 1
Define 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = 4 (3𝑋(1) + 𝑋(𝑛) + 1) and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2) an
estimator for 𝜃, then which of the following is/are TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
B. 𝑇1 is MLE for 𝜃 but 𝑇2 and 𝑇3 are not MLE for 𝜃
C. 𝑇1 , 𝑇2 and 𝑇3 are MLE for 𝜃
D. 𝑇1 , 𝑇2 and 𝑇3 are not MLE for 𝜃
Question: 27
Let 𝑋 and 𝑌 be random variable having joint probability density function
𝑘
𝑓(𝑥, 𝑦) = ; −∞ < (𝑥, 𝑦) < ∞
(1 + 𝑥 2 )(1 + 𝑦2)
Where k is constant, then which of the following is/are TRUE?
1
A. k = 𝜋2
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
C. P(X = Y) = 0
D. All of the above
258 | P a g e
Question: 28
Lę 𝑋1 , 𝑋2 , … , 𝑋𝑛 be sequence of independently and identically distributed random variables
with the probability density function
1 2 −𝑥
𝑓(𝑥) = {2 𝑥 𝑒 , if 𝑥 > 0 and let
0, otherwise
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then which of the following statement is/are TRUE?
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question: 29
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
259 | P a g e
Question: 31
Let X and 𝑌 are random variable with 𝐸[𝑋] = 𝐸[𝑌], then which of the following is NOT
TRUE?
A. E{E[X ∣ Y]} = E[Y]
B. V(𝑋 − 𝑌) = 𝐸(𝑋 − 𝑌)2
C. 𝐸[𝑉(𝑋 ∣ 𝑌)] + 𝑉[𝐸(𝑋 ∣ 𝑌)] = 𝑉(𝑋)
D. X and Y have same distribution
Question: 32
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃 ) distribution, where 𝜃 ∈ (0, ∞).
1
If 𝑋‾ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 , then a 95% confidence interval for 𝜃 is
2
𝜒2𝑛,0.95
A. (0, ]
𝑛𝑋‾
2
𝜒2𝑛,0.95
B. [ , ∞)
𝑛𝑋‾
2
𝜒2𝑛,0.95
C. (0, ]
2𝑛𝑋‾
2
𝜒2𝑛,0.95
D. [ , ∞)
2𝑛𝑋‾
Question: 33
𝑋𝑖 , 𝑖 = 1,2, … be independent random variables all distributed according to the PDF 𝑓𝑥 (𝑥) =
1,0 ≤ 𝑥 ≤ 1. Define 𝑌𝑛 = 𝑋1 𝑋2 𝑋3 … 𝑋𝑛 , for some integer n. Then Var (𝑌𝑛 ) is equal to
𝑛
A. 12
1 1
B. − 22𝑛
3𝑛
1
C. 12𝑛
1
D. 12
Question: 34
Let 𝑋1 , 𝑋2 , … , 𝑋4 be i.i.d random variables having continuous distribution.
Then 𝑃(𝑋3 < 𝑋2 < max(𝑋1 , 𝑋4 )) equal
A. 1/2
B. 1/3
C. 1/4
D. 1/6
260 | P a g e
Question: 35
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 0 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }
Consider the following statement on above:
1
1. 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is consistent for 𝜃
1
2. 𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is unbiased consistent for 𝜃
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. Both 1 and 2
D. Neither 1 nor 2
9.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
9.6 GLOSSARY
• Motivation: These Problems are very useful in real life and we can use it in data
science , economics as well as social sciemce.
• Attention: Think how the best estimator are useful in real world problems.
9.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1: A
Explanation :
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwisse
𝑛𝜃
E(𝑋(𝑛) ) =
𝑛+1
1 𝜃
Let Y = 𝜃 (𝑋(𝑛) + 𝑛+1)
𝑋(𝑛) 1
so E(Y) = E( +𝑛+1) = 1
𝜃
261 | P a g e
1 1
lim𝑛→∞ 𝐸 [ (𝑋(𝑛) + )] = 1;
𝜃 𝑛+1
Hence option A is correct.
Answer 2: B
Explanation:
Convexity (peakedness) is decided by kurtosis.
Answer 3: C
Explanation :
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛𝑥‾𝑦‾ 400 − 10 × 5 × 4 4 4
𝛽ˆ = 𝑛 2 2
= 2
= ; 𝛼ˆ = 𝑦‾ − 𝑥‾𝛽ˆ = 4 − 5 × = 0
∑𝑖=1 𝑥𝑖 − 𝑛𝑥‾ 500 − 10 × 5 5 5
An unbiased estimate of 𝜎 2 is
1 2 1 4 2
𝜎ˆ 2 = 𝑛−2 ∑𝑛𝑖=1 (𝑦𝑖 − 𝛼ˆ − 𝛽ˆ 𝑥𝑖 ) = 10−2 ∑𝑛𝑖=1 (𝑦𝑖 − 5 𝑥𝑖 )
1 4 4 2 1 8 16
= 8 (∑𝑛𝑖=1 𝑦𝑖2 − 2 × 5 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 + (5) ∑𝑛𝑖=1 𝑥𝑖2 ) = 8 (400 − 5 × 400 + 25 × 500)
= 10
Hence option C is correct.
Answer 4: B
Explanation :
2
We use the random variable 𝑄 = 𝜃 ∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒(2𝑛)
2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
1 − 𝛼 = 𝑃 (𝜒α2 (2𝑛) ≤ 𝑄 ≤ 𝜒1−
2
α (2𝑛)) = 𝑃 [ 2 ≤𝜃≤ 2 (2𝑛) ]
2 2 𝜒 α (2𝑛) 𝜒α
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
Thus, 100(1 − 𝛼)% confidence interval for 𝜃 is given by [𝜒2 ≤𝜃≤ 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝛼
1−
2 2
Hence option B is correct.
Answer 5: B
Explanation :
262 | P a g e
Let 𝑌 = [X]; where 𝑌 denote the greatest integer smaller than or equal to 𝑋.
26 1 26
𝑃(𝑌 ≤ 25) = 𝑃([𝑋] ≤ 25) = 𝑃(𝑋 ∈ (0,26)) = ∫0 𝑑𝑥 =
100 100
Hence B is the correct option.
Answer 6: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ {1,2,3,4} then 𝜃 = 3
Hence option C is correct.
Answer 7: D
Explanation :
The marginal pmf of 𝑌 is given by
∞ ∞
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑋 = 𝑥), 𝑌 = 𝑦) = ∑ 𝑒 −2 (𝑦) ( ) ( )
4 4 𝑥!
𝑥=𝑦 𝑥=𝑦
𝑦 ∞ 𝑢
3 𝑦+𝑢 1 2𝑦+𝑢
= 𝑒 −2 ( ) ∑ ( 𝑦 ) ( ) (Assume 𝑢 = 𝑥 − 𝑦)
4 4 (𝑦 + 𝑢)!
𝑢=0
𝑦 ∞ ∞
−2
3 (𝑦 + 𝑢)! 1 𝑢 2𝑦+𝑢 3 𝑦 2𝑦 1! 1 𝑢
=𝑒 ( ) ∑ ( ) == 𝑒 −2 ( ) ∑ ( )
4 𝑦! 𝑢! 4 (𝑦 + 𝑢)! 4 𝑦! 𝑢! 2
𝑢 0
3 3 𝑦
− ( )
−2
3 𝑦 2𝑦 1/2 𝑒 22
=𝑒 ( ) 𝑒 = , 𝑦 = 0,1, …
4 𝑦! 𝑦!
Which is the pmf of Poisson random variable with parameter 3/2, so 𝐸(𝑋) = 3/2 and
𝑉(𝑋) = 3/2.
Answer 8: A
Explanation :
The marginal probability mass function of X is given by
263 | P a g e
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
1 1 𝑚
𝑒 − 2(2)
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞ 𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
𝑒 −1
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 9: 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
2 2
= 2 × 6̅ (1 − 6) = 16/36
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
264 | P a g e
Answer 10: 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
265 | P a g e
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
1 1 𝑘−1
𝐸(𝑌) = ∑∞
𝑘=0 𝑘 ( ) =2
2 2
266 | P a g e
𝑇 = ∑𝑛𝑖=1 𝑋𝑖
𝑇 𝑇(𝑇−1) 3 3(3−1) 21
Therefore, 𝑛 + 𝑛(𝑛−1) = 6 + 6(6−1) = 30 = 0.70
Answer 20: B
Explanation:
Let 𝑋1 , 𝑋2 , … be identically distributed random variable and let N be a random variable.
267 | P a g e
Define 𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
Then E(SN ) = E(Xi ) ⋅ E(N) = 4
𝑋(3) ≤ 𝜃 2 ⇒ 𝜃ˆ ∈ [√𝑋(3) , ∞)
3
1
𝐿(𝑋, 𝜃) = ∏ 𝑓(𝑥𝑖 , 𝜃) =
𝜃6
𝑖=1
∂𝐿
⇒ ∂𝜃 < 0 there fore given function is decreasing then 𝜃ˆ = √𝑋(3)
Answer 23: C
Explanation:
X −1 0 1
268 | P a g e
1 1
𝑉(𝑋) = 𝐸(𝑋 2 ) − {𝐸(𝑋)}2 = ⇒ 𝜎𝑋 =
4 2
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } = 𝑃{|𝑋| ≥ 1} = 1 − 𝑃(|𝑋| < 1)
= 1 − 𝑃(−< 𝑋 < 1) = 1 − 𝑃(𝑋 = 0) = 1/4
1
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4 [By Chebychev’s inequality]
Answer 24: B
Explanation:
Let 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2) be a i.i.d random sample of size 2 from a standard normal distribution.
√2(X1 +X2 )
Then W = ∼ 𝑡(2)
√(X2 −X1 )2 +(Y2 −Y1 )2
Hence option (b) is correct.
Answer 25: D
Explanation:
Let 𝑋 be Random Variable with 𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑etx P(X = x)
1
; 𝑥=0
6
1
; 𝑥=1
3
Then 𝑃(𝑋 = 𝑥) = 1
; 𝑥=2
3
1
{6 ; 𝑥 = 3
𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)
1 1 1 5
= + + =
6 3 3 6
Answer 26: A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
distribution of 𝑋 free from parameter,
1 1
then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1 3
Take 𝜆 = 2 , 4 and 4 then we obtained mle of 𝜃 are
269 | P a g e
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
Answer 28: D
Explanation:
Clearly, 𝑋1 , 𝑋2 , … , 𝑋𝑛
are i.i.d 𝐺(3,1) random variables. Then, 𝐸(𝑋𝑖 ) = 3 and Var (𝑋𝑖 ) = 3, 𝑖 = 1,2, …
Let 𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then E(𝑆𝑛 ) = 3𝑛 and Var (𝑆𝑛 ) = 3𝑛
270 | P a g e
𝑠𝑛 −𝐸(𝑠𝑛 ) 3(𝑛−√𝑛)−𝐸(𝑆𝑛 )
lim𝑛→∞ 𝑃 ( ≥ ) = 𝑃(𝑍 ≥ −√3) = 1 −
√Var (𝑆2 ) √Var (𝑆𝑤 )
𝑃(𝑍 ≤ −√3)
1
= 1 − Φ(−√3) ≥
2
Answer 29: D
Explanation:
(A) Sum of independent binomial variate is also a binomial variate if corresponding
probability will be same then 𝑋 + 𝑌 ∼ Bin(2𝑛, 𝑝)
(B) When there are more than two variables include, the observation lead to multinomial
distribution.
(𝑋, 𝑌) not follows Multinomial (2𝑛; 𝑝, 𝑝)
(C) Var (X − Y) = E(X − Y)2 − {E(X − Y)}2 = E(X − 𝑌)2
(D) Cov (𝑋 + 𝑌, 𝑋 − 𝑌) = 𝑉(𝑋) − Cov (𝑋, 𝑌) + Cov (𝑌, 𝑋) − 𝑉(𝑌) = 0
{∴ X and Y are independent Cov (X1 Y) = Cov (Y, X) = 0}
Hence option D is correct.
Answer 30: D
Explanation:
The joint pdf of 𝑋 and 𝑌 is
1 2 +𝑦 2 ) 1 2 1 2
1 1 1
𝑓(𝑥, 𝑦) = 2𝜋 𝑒 −2(𝑥 = 𝑒 −2(𝑥 ) × 𝑒 −2(𝑦 ) ; (𝑥, 𝑦) ∈ ℝ2
√2𝜋 √2𝜋
It is easy to see that 𝑋 and 𝑌 are i.i.d 𝑁(0,1) random variables, and therefore,
1
𝑃(𝑋 > 0) = 2
1 1 1
𝑃(𝑋 > 0)𝑃(𝑌 < 0) = × =
2 2 4
271 | P a g e
272 | P a g e
Answer 35: A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
ˆ
𝜃 ∈ [𝑋(𝑛) − 2 , 𝑋(1) + 2] ; distribution of 𝑋
1 1
free from parameter, then 𝜃ˆ = 𝜆 (𝑋(𝑛) − ) + (1 − 𝜆) (𝑋(1) + ) ; 0 < 𝜆 < 1
2 2
1 1
Take 𝜆 = 2 , 4 we get
1
𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is MLE as well as consistent for 𝜃
1
𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is MLE as well as consistent for 𝜃 but not unbiased…
Hence option (A) is correct.
9.8 REFERENCES
• Devore, J. (2012). Probability and statistics for engineers, 8th ed. Cengage Learning.
• John A. Rice (2007). Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Larsen, R., Marx, M. (2011). An introduction to mathematical statistics and its
applications. Prentice Hall.
• Miller, I., Miller, M. (2017). J. Freund’s mathematical statistics with applications, 8th
ed. Pearson.
• Demetri Kantarelis, D. and Malcolm O. Asadoorian, M. O. (2009). Essentials of
Inferential Statistics, 5th edition, University Press of America.
• Hogg, R., Tanis, E., Zimmerman, D. (2021) Probability and Statistical inference,
10TH Edition, Pearson
9.9 SUGGESTED READINGS
• S. C Gupta, V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
273 | P a g e
LESSON 10
TESTING OF EQUALITY OF MEAN AND VARIANCE
STRUCTURE
10.1 Learning Objectives
10.2 Introduction
10.3 Testing of Equality of Mean and Variance
10.3.1 Test for the Mean of a Normal Population
10.3.2 Test for the Mean of Several Normal Population
10.3.3 Test for the Varinace of Normal Population
10.3.4 Test for the Variance of Several Normal Population
10.4 In-Text Questions
10.5 Summary
10.6 Glossary
10.7 Answer To In-Text Questions
10.8 References
10.9 Suggested Readings
10.1 LEARNING OBJECTIVES
In this chapter our main aim to understand how to test equality of mean and variance of two
normal as well as several normal population.
10.2 INTRODUCTION
The main problems in statistical inference can be broadly classified into two areas:
(i) The area of estimation of population parameter(s) and setting up of confidence intervals
for them, i.e, the area of point and intertal estimation and
(ii) Tests of statistical hypothesis.
In Neyman-Pearson theory, we use statistical methods to arrive at decisions in certain
situations where there is lack of certainty on the basis of a sample whose size is fixed in
advance while in Wald's sequential theory the sample size is not fixed but is regarded as a
random variable.
274 | P a g e
275 | P a g e
1
𝜎ˆ 2 = ∑ (𝑥𝑖 − 𝜇0 )2 = 𝑠02 , ( say )
𝑛
1
= ∑ (𝑥𝑖 − 𝑥‾ + 𝑥‾ − 𝜇0 )2
𝑛
1
= ∑ (𝑥𝑖 − 𝑥‾)2 + (𝑥‾ − 𝜇0 )2 ,
𝑛
the product term vanishes, since ∑(𝑥𝑖 − 𝑥‾)(𝑥‾ − 𝜇0 ) = (𝑥‾ − 𝜇0 )Σ(𝑥𝑖 − 𝑥‾) = 0
𝜎ˆ 2 = 𝑠 2 + (𝑥‾ − 𝜇0 )2 = 𝑠02 , ( say ).
Hence, substituting then,
𝑛/2
1
𝐿(Θ̂0 ) = ( ) exp (−𝑛/2)
2𝜋𝑠02
The ratio of gives the likelihood ratio criterion
𝑛/2
𝐿(Θ̂0 )𝑠2
𝜆 = = ( 2)
𝐿(Θ̂) 𝑠0
𝑛/2 𝑛/2
𝑠2 1
={ 2 } ={ }
𝑠 + (𝑥‾ − 𝜇0 )2 1 + [(𝑥‾ − 𝜇0 )2 /𝑠 2 ]
276 | P a g e
−𝑛/2 𝑛/2
𝑡2 𝑡2
(1 + ) ≤ 𝜆0 ⇒ (1 + ) ≥ 𝜆−1
0
𝑛−1 𝑛−1
𝑡2
⇒ ≥ (𝜆0 )−2/𝑛 − 1 ⇒ 𝑡 2 ≥ (𝑛 − 1)[𝜆0 − 2/𝑛 − 1] = 𝐴2 , (say).
𝑛−1
where 𝑓(⋅) is the p.d.f. of Student's 𝑡 with 𝑛 d.f. The critical region is shown in the following
diagram.
Important Remarks 1. Let us now consider the problem of testing the hypothesis :
𝐻0 : 𝜇 = 𝜇0 , 0 < 𝜎 2 < ∞.
against the alternative hypothesis
𝐻1 : 𝜇 > 𝜇0 , 0 < 𝜎 2 < ∞
277 | P a g e
Here
and
Θ = {(𝜇, 𝜎 2 ): −∞ < 𝜇 < ∞, 0 < 𝜎 2 < ∞}
Θ0 = {(𝜇, 𝜎 2 ): 𝜇 = 𝜇0 , 0 < 𝜎 2 < ∞}
The maximum likelihood estimates of 𝜇 and 𝜎 2 belonging to Θ are given by
𝑥‾, if 𝑥‾ ≥ 𝜇0
𝜇ˆ = {
𝜇0 , if 𝑥‾ < 𝜇0
𝑠 2 , if 𝑥‾ ≥ 𝜇0
𝜎ˆ 2 = { 2
𝑠0 , if 𝑥‾ < 𝜇0
𝑛
1
𝑠02 = ∑ (𝑥𝑖 − 𝜇0 )2
𝑛
𝑖=1
Thus
1 𝑛/2
( ) ⋅ exp (−𝑛/2), if 𝑥‾ ≥ 𝜇0
2𝜋𝑠 2
𝐿(Θ̂) = 𝑛/2
1
( 2) ⋅ exp (−𝑛/2), if 𝑥‾ < 𝜇0
{ 2𝜋𝑠0
In Θ0 , the only unknown parameter is 𝜎 2 whose 𝑀𝐿𝐸 is given by 𝜎ˆ 2 = 𝑠02 . Thus
𝑛/2
1
𝐿(Θ̂0 ) =( ) ⋅ exp (−𝑛/2)
2𝜋𝑠02
𝐿(Θ̂0 ) (𝑠 2 /𝑠02 )𝑛/2 , if 𝑥‾ ≥ 𝜇0
∴ 𝜆 = ={
𝐿(Θ̂) 1, if 𝑥‾ < 𝜇0
Thus the sample observations (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) for which 𝑥‾ < 𝜇0 are to be included in the
acceptance region. Hence for the sample observations for which 𝑥‾ ≥ 𝜇0 , the likelihood ratio
criterion becomes
𝜆 = (𝑠 2 /𝑠02 )𝑛/2 , 𝑥‾ ≥ 𝜇0
which is the same as the expression obtained in (18.29). Proceeding similarly as in the above
problem, the critical region of the form 0 < 𝜆 < 𝜆0 will be equivalently given by 𝑡 2 =
𝑛(𝑥‾−𝜇0 )2 √𝑛(𝑥‾−𝜇 )
≥ 𝐴2 or by 𝑡 = 0
≥ 𝐴 where 𝑡 follows Student's t distribution with (𝑛 − 1)
𝑆2 𝑆
d.f. The constant 𝐴 is to be determined so that 𝑃(𝑡 > 𝐴) = 𝛼 ⇒ 𝐴 = 𝑡𝑛−1 (𝛼)
Hence for testing 𝐻0 : 𝜇 = 𝜇0 against 𝐻1 : 𝜇 > 𝜇0 , we have the right tailed-t-test defined as
follows :
278 | P a g e
√𝑛(𝑥‾−𝜇0 )
1. Reject 𝐻0 if 𝑡 = > 𝑡𝑛−1 (𝛼) and if 𝑡 < 𝑡𝑛−1 (𝛼), 𝐻0 may be accepted.
𝑆
3. We summarise below in a tabular form the test criterion, along with the confidence
interval for the parameter for testing the hypothesis 𝐻0 : 𝜇 = 𝜇0 against various
alternatives for the normal population when 𝜎 2 is not known.
[Here 𝑡𝑛 (𝛼) is upper 𝛼-point of the 𝑡-distrbution with 𝑛 d.f. as defined in (18.33a)].
NORMAL POPULATION 𝑵(𝝁, 𝝈𝟐 ); 𝝈𝟐 UNKNOWN
279 | P a g e
The maximum likelihood estimates for 𝜇1 , 𝜇2 , 𝜎12 and 𝜎22 are given by the equations :
𝑚
∂ 1
log 𝐿 = 0 ⇒ 𝜇ˆ1 = ∑ 𝑥1𝑖 = 𝑥‾1
∂𝜇1 𝑚
𝑖=1
𝑛
∂ 1
log 𝐿 = 0 ⇒ 𝜇2 = ∑ 𝑥2𝑗 = 𝑥‾2
∂𝜇2 𝑛
𝑗=1
𝑚
∂ 1
ˆ12 = ∑ (𝑥1𝑖 − 𝑥‾1 )2 = 𝑠12 , (say).
2 log 𝐿 = 0 ⇒ 𝜎
∂𝜎1 𝑚
𝑖=1
𝑛
∂ 2
1 2
and 2 log 𝐿 = 0 ⇒ 𝜎
ˆ 2 = ∑ (𝑥2𝑗 − 𝑥‾2 ) = 𝑠22 , (say).
∂𝜎2 𝑛
𝑗=1
280 | P a g e
𝑚/2 𝑚 𝑛/2
1 1 2
1
𝐿(Θ0 ) = ( ) ⋅ exp {− ∑ (𝑥1𝑖 − 𝜇) } × ( )
2𝜋𝜎1 2 2𝜎1 2 2𝜋𝜎2 2
𝑖=1
𝑛
1 2
⋅ exp {− 2
∑ (𝑥2𝑗 − 𝜇) }
2𝜎2
𝑗=1
To obtain the maximum value of 𝐿(Θ0 ) for variations in 𝜇, 𝜎12 and 𝜎22 , it will be seen that
estimate of 𝜇 is obtained as the root of a cubic equation
𝑚2 (𝑥‾1 − 𝜇) 𝑛2 (𝑥‾2 − 𝜇)
+
∑𝑚
𝑖=1 (𝑥1𝑖 − 𝜇ˆ)
2 ∑𝑛𝑖=1 (𝑥2𝑖 − 𝜇ˆ)2
and is thus a complicated function of the sample observations. Consequently the likelihood
ratio criterion 𝜆 will be a complex function of the observations and its distribution is quite
tedious since it involves the ratio of two variances. Consequently, it is impossible to obtain the
critical region 0 < 𝜆 < 𝜆0 for given 𝛼 since the distribution of the population variances is
ordinarily unknown. However, in any given instance the cubic equation (16.43) can be solved
for 𝜇 by numerical analysis technique and thus 𝜆 can be computed. Finally, as an approximate
test, −2log 𝑒 𝜆 can be regarded as a 𝜒 2 -variate with 1 d.f. (c.f. Theorem 18.2).
Case 2. Population Variances are equal, i.e., 𝜎12 = 𝜎22 = 𝜎 2 , (say). In this case
Θ = {(𝜇1 , 𝜇2 , 𝜎 2 ): −∞ < 𝜇𝑖 < ∞, 𝜎 2 > 0, (𝑖 = 1,2)}
Θ0 = {(𝜇, 𝜎 2 ): −∞ < 𝜇 < ∞, 𝜎 2 > 0}
The likelihood function is then given by
𝑚 𝑛
1 (𝑚+𝑛)/2 1 2
𝐿=( 2
) ⋅ exp [− 2 {∑ (𝑥1𝑖 − 𝜇1 )2 + ∑ (𝑥2𝑗 − 𝜇2 ) )]
2𝜋𝜎 2𝜎
𝑖=1 𝑗=1
2
For 𝜇1 , 𝜇2 , 𝜎 ∈ Θ, the maximum
∂ ∂
log 𝐿 = 0 ⇒ 𝜇ˆ1 = 𝑥‾1 and log 𝐿 = 0 ⇒ 𝜇ˆ2 = 𝑥‾2
∂𝜇1 ∂𝜇2
∂ 2
1 2
2
log 𝐿 = 0 ⇒ 𝜎
ˆ = {Σ(𝑥1𝑖 − 𝜇ˆ1 )2 + Σ(𝑥2𝑗 − 𝜇ˆ2 ) }
∂𝜎 𝑚+𝑛
281 | P a g e
1 2 1
⇒ 𝜎ˆ 2 = {Σ(𝑥1𝑖 − 𝑥‾1 )2 + Σ(𝑥2𝑗 − 𝑥‾2 ) } = (𝑚𝑠12 + 𝑛𝑠22 )
𝑚+𝑛 𝑚+𝑛
… (18.45𝑎) 33
Substituting the values from (18.45) and (18.45a) in (18.44), we get
(𝑚+𝑛)/2
(𝑚 + 𝑛) 1
𝐿(Θ̂) = { 2 2
} ⋅ exp {− (𝑚 + 𝑛)}
2𝜋(𝑚𝑠1 + 𝑛𝑠2 ) 2
In Θ0 , 𝜇1 = 𝜇2 = 𝜇 (say) and we get
𝑚 𝑛
1 (𝑚+𝑛)/2 1 2
𝐿(Θ0 ) =( 2
) ⋅ exp [− 2 {∑ (𝑥1𝑖 − 𝜇)2 + ∑ (𝑥2𝑗 − 𝜇) }]
2𝜋𝜎 2𝜎
𝑖=1 𝑗=1
𝑚+𝑛 1 2
⇒ log 𝐿(Θ0 ) =𝐶− log 𝜎 2 − 2 {∑ (𝑥1𝑖 − 𝜇)2 + ∑ (𝑥2𝑗 − 𝜇) } ,
2 2𝜎
𝑖 𝑗
282 | P a g e
𝑛
2 𝑛𝑚2 (𝑥‾2 − 𝑥‾1 )2
∑ (𝑥2𝑗 − 𝜇ˆ) = 𝑛𝑠22 +
(𝑚 + 𝑛)2
𝑗=1
(𝑚+𝑛)/2
2 2
𝐿(Θ̂0 ) 𝑚𝑠1 + 𝑛𝑠2
∴ 𝜆 = ={ 𝑚𝑛 }
𝐿(Θ̂2 𝑚𝑠1 2 + 𝑛𝑠2 2 + 𝑚 + 𝑛 (𝑥‾1 − 𝑥‾2 )2
(𝑚+𝑛)/2
1
=[ ]
𝑚𝑛(𝑥‾1 − 𝑥‾2 )2
{1 + }
(𝑚 + 𝑛)(𝑚𝑠1 2 + 𝑛𝑠2 2 )
We know that (c.f. $16 ⋅ 3.3 ), under the null hypothesis 𝐻0 : 𝜇1 = 𝜇2 , the statistic :
ar
where
𝑥‾1 − 𝑥‾2
𝑡 = ,
1 1
𝑠√𝑚 + 𝑛
1
𝑆2 = (𝑚𝑠1 2 + 𝑛𝑠2 2 )
𝑚+𝑛−2
follows Student's 𝑡-distribution with (𝑚 + 𝑛 − 2) d.f. Thus in terms of 𝑡, we get
−(𝑚+𝑛)/2
𝑡2
𝜆 = (1 + )
𝑚+𝑛−2
the test can as well be carried with 𝑡 rather than with 𝜆. The critical region 0 < 𝜆 < 𝜆0
transforms to the critical region of the type
283 | P a g e
1
𝑡 2 > (𝑚 + 𝑛 − 2) [ 2 − 1] = 𝐴2 , ( say )
𝜆0 /(𝑚 + 𝑛)
i.e., by
where 𝐴 is determined so that: 𝑃[|𝑡| > 𝐴 ∣ 𝐻0 ] = 𝛼
Since under 𝐻0 , the statistic 𝑡 follows Student's 𝑡-distribution with (𝑚 + 𝑛 − 2) d.f.,
𝛼
we get : 𝐴 = 𝑡𝑚+𝑛−2 ( 2 )
where, 𝑡𝑛 (𝛼) is the right 100𝛼% point of the 𝑡-distribution with 𝑛𝑑. 𝑓.
Thus for testing the null hypothesis against the alternative :
𝐻0 : 𝜇1 = 𝜇2 ; 𝜎12 = 𝜎22 = 𝜎 2 > 0
𝐻1 : 𝜇1 ≠ 𝜇2 , 𝜎12 = 𝜎22 = 𝜎 2 > 0
284 | P a g e
285 | P a g e
and
1 2 𝑆𝑇
⇒ 𝜎ˆ 2 =
∑ ∑ (𝑥𝑖𝑗 − 𝑥‾) = (say),
𝑛 𝑛
where in ANOVA terminology, 𝑆𝑇 , is called total sum of squares (T.S.S.)
after Substituting , we get respectively
𝑛 𝑛/2 𝑛
𝐿(Θ̂) =( ) ⋅ exp (− )
2𝜋𝑆𝑊 2
𝑛 𝑛/2 𝑛
𝐿(Θ̂0 ) =( ) ⋅ exp (− )
2𝜋𝑆𝑇 2
𝑛/2
𝐿(Θ̂0 ) 𝑆𝑊
𝜆 = =( )
𝐿(Θ̂ 𝑆𝑇
And We have
2 2
𝑆𝑇 = ∑ ∑ (𝑥𝑖𝑗 − 𝑥‾) = ∑ (𝑥𝑖𝑗 − 𝑥‾𝑖 + 𝑥‾𝑖 − 𝑥‾)
𝑖𝑗 𝑗 𝑖𝑗
2
= ∑ ∑ (𝑥𝑖𝑗 − 𝑥‾𝑖 ) + ∑ (𝑥‾𝑖 − 𝑥‾)2 + 2 ∑ [(𝑥‾𝑖 − 𝑥‾) ∑ (𝑥𝑖𝑗 − 𝑥‾𝑖 )]
𝑖𝑗 𝑗 𝑖𝑗 𝑖 𝑗
𝑛 𝑖
But ∑𝑗=1 (𝑥𝑖𝑗 − 𝑥‾𝑖 ) = 0, being the algebraic sum of the deviations of the observations of the
𝑖 th sample from its mean.
2
∴ 𝑆𝑇 = ∑ ∑ (𝑥𝑖𝑗 − 𝑥‾𝑖 ) + ∑ 𝑛𝑖 (𝑥‾𝑖 − 𝑥‾)2 = 𝑆𝑊 + 𝑆𝐵 , (say),
𝑖𝑗 𝑗 𝑖
2
where 𝑆𝐵 = ∑𝑖 𝑛𝑖 (𝑥‾𝑖 − 𝑥‾) , in ANOVA terminology is called between samples sum squares
(B.S.S.).
after Substituting, we get
286 | P a g e
𝑛/2
𝑆𝑊 1
𝜆=( ) =
𝑆𝑊 + 𝑆𝐵 𝑆𝐵 𝑛/2
(1 + 𝑆 )
𝑊
287 | P a g e
Test For the Variance of a Normal Population. Let us now consider the problem of testing if
the variance of a normal population has a specified value 𝜎02 on the basis of a random sample
𝑥1 , 𝑥2 , … , 𝑥𝑛 of size 𝑛 from normal population 𝑁(𝜇, 𝜎 2 )
We want to test the hypothesis: 𝐻0 : 𝜎 2 = 𝜎02 , (specified),
against the alternative hypothesis: 𝐻1 : 𝜎 2 ≠ 𝜎02
Here we have and
Θ = {(𝜇, 𝜎 2 ): −∞ < 𝜇 < ∞, 𝜎 2 > 0}
Θ0 = {(𝜇, 𝜎 2 ): −∞ < 𝜇 < ∞, 𝜎 2 = 𝜎02 }
The likelihood function of the sample observations is given by
𝑛
1 𝑛/2 1
𝐿=( 2
) , exp {− 2 ∑ (𝑥𝑖 − 𝜇)2 }
2𝜋𝜎 2𝜎
𝑖=1
we shall get
1 𝑛/2 𝑛
𝐿(Θ̂) = ( 2
) exp (− )
2𝜋𝑠 2
In Θ0 , we have only one variable parameter, viz., 𝜇 and
𝑛
1 𝑛/2 1
)
𝐿(Θ0 = ( 2
) exp {− 2 ∑ (𝑥𝑖 − 𝜇)2 }
2𝜋𝜎 2𝜎0
𝑖=1
∂
The MLE for 𝜇 is given by : log 𝐿 = 0 ⇒ 𝜇ˆ = 𝑥‾
∂𝜇
𝑛/2 𝑛
1 1
∴ 𝐿(Θ̂0 ) =( ) exp {− 2 ∑ (𝑥𝑖 − 𝑥‾)2 }
2𝜋𝜎02 2𝜎0
𝑖=1
𝑛/2
1
=( ) exp (−𝑛𝑠 2 /2𝜎02 )
2𝜋𝜎02
The likelihood ratio criterion is given by
𝑛/2
𝐿(Θ̂0 ) 𝑠2 1 𝑛𝑠 2
𝜆= = ( 2) exp {− ( 2 − 𝑛)}
𝐿(Θ̂) 𝜎0 2 𝜎0
𝑛𝑠2
We know that under 𝐻0 , the statistic : 𝜒 2 = follows chi-square distribution with (𝑛 − 1)
𝜎02
d.f. In terms of 𝜒 2 , we have
𝑛/2
𝜒2 1
𝜆 = [𝑛] ⋅ exp [− 2 (𝜒 2 − 𝑛)]
288 | P a g e
Since 𝜆 is a monotonic function of 𝜒 2 , the test may be done using 𝜒 2 as a criterion. The critical
region 0 < 𝜆 < 𝜆0 is now equivalent to
1
(𝜒 2 /𝑛)𝑛/2 exp [− (𝜒 2 − 𝑛)] < 𝜆0
2
1 2
or exp (− 𝜒 ) (𝜒 2 )𝑛/2 < 𝜆0 ⋅ (𝑛𝑒 −1 )𝑛/2 = 𝐵, (say).
2
Since 𝜒 2 has chi-square distribution with (𝑛 − 1) d.f., the critical region is determined by a
pair of intervals 0 < 𝜒 2 < 𝜒22 and 𝜒12 < 𝜒 2 < ∞, where 𝜒12 and 𝜒2 2 are to be determined such
that the ordinates of (18.73) are equal, i.e.,
1 1
(𝜒1 2 )𝑛/2 exp (− 𝜒12 ) = (𝜒2 2 )𝑛/2 exp (− 𝜒2 2 )
2 2
289 | P a g e
1 (𝑚+𝑛)/2 1 2
𝐿(Θ0 ) = ( 2
) ⋅ exp [− 2 {∑ (𝑥1𝑖 − 𝜇1 )2 + ∑ (𝑥2𝑗 − 𝜇2 ) }]
2𝜋𝜎 2𝜎
𝑖 𝑗
and the MLE's for 𝜇1 , 𝜇2 and 𝜎 2 are now given by 𝜇ˆ1 = 𝑥‾1 , 𝜇ˆ2 = 𝑥‾2
and
1 2
𝜎ˆ 2 = {∑ (𝑥1𝑖 − 𝜇ˆ1 )2 + ∑ (𝑥2𝑗 − 𝜇ˆ2 ) }
(𝑚 + 𝑛)
𝑖 𝑗
1 2 𝑚𝑠12 + 𝑛𝑠22
= {∑ (𝑥1𝑖 − 𝑥‾1 )2 + ∑ (𝑥2𝑗 − 𝑥‾2 ) } =
𝑚+𝑛 𝑚+𝑛
𝑖 𝑗
290 | P a g e
(𝑚+𝑛)/2
𝑚+𝑛 1
𝐿(Θ̂0 ) ={ 2 2
} ⋅ exp {− (𝑚 + 𝑛)}
2𝜋(𝑚𝑠1 + 𝑛𝑠2 ) 2
𝐿(Θ̂)0 (𝑚+𝑛)/2
(𝑠1 ) (𝑠2 2 )𝑛/2
2 𝑚/2
∴ 𝜆 = = (𝑚 + 𝑛) { }
𝐿(Θ̂) [𝑚𝑠1 2 + 𝑛𝑠2 2 ](𝑚+𝑛)/2
(𝑚 + 𝑛)(𝑚+𝑛)/2 (𝑚𝑠1 2 )𝑚/2 (𝑛𝑠2 2 )𝑛/2
= { }
𝑚𝑚/2 ⋅ 𝑛𝑛/2 [𝑚𝑠12 + 𝑛𝑠2 2 ](𝑚+𝑛)/2
291 | P a g e
𝑆1 2 /𝜎12 𝑆1 2 1
𝐹= 2 = 2 ⋅ 2 (under 𝐻0 ),
𝑆2 /𝜎2 2 𝑆2 𝛿1
follows 𝐹-distribution with (𝑚 − 1, 𝑛 − 1) d.f. The test-statistic, the test criterion and (1 − 𝛼)
confidence interval for the parameter for various alternative hypotheses are given in the
following table.
If 𝛿0 = 1, the above test reduces to testing the equality of population variances.
𝜎2
NORMAL POPULATION; 𝐻0 : 𝜎12 = 𝛿02
2
292 | P a g e
𝑘 𝑛𝑖 /2 𝑛𝑖
1 1
𝐿 = ∏ [( ) ⋅ exp {− 2 ∑ (𝑥𝑖𝑗 − 𝜇𝑖 )}]
2𝜋𝜎𝑖2 2𝜎𝑖
𝑖=1 𝑗=1
It can be easily seen that in Θ the MLE's of 𝜇𝑖′ 's and 𝜎𝑖′ 's are given by
𝑛𝑖
1 2
𝜇ˆ𝑖 = and 𝜎ˆ𝑖2 = ∑ (𝑥𝑖𝑗 − 𝑥‾𝑖 ) = 𝑠𝑖 2
𝑛𝑖
𝑗=1
𝑘 𝑛𝑖 /2
1 𝑛𝑖
16.4. ∴ 𝐿(Θ̂) = ∏ {( ) ⋅ exp (− )}
2𝜋𝑠𝑖2 2
𝑖=1
𝑘 𝑛𝑖 /2
𝑛 1
= exp (− ) ⋅ ∏ {( ) } , where 𝑛 = ∑ 𝑛𝑖
2 2𝜋𝑠𝑖2
𝑖=1
1 𝑛/2 1 2
𝐿(Θ0 ) = ( 2
) ⋅ exp {− 2 ∑ ∑ (𝑥𝑖𝑗 − 𝜇𝑖 ) }
2𝜋𝜎 2𝜎
𝑗 𝑖
𝜆 is thus a complicated function of sample observations and it is not easy to obtain its
distribution. However, if 𝑛𝑖′ 's are large (𝑖 = 1,2, … , 𝑘), it provides an approximate test defined
as follows :
293 | P a g e
For large 𝑛𝑖′ s, the quantity −2log 𝜆 𝜆 is approximately distributed as a chi-square variate with
2𝑘 − (𝑘 + 1) = 𝑘 − 1 d.f.
The test can, however, be made even if 𝑛𝑖 's are not large. It has been investigated and found
that the distribution of −2log 𝑐 𝜆 is approximately a 𝜒 2 -distribution with (𝑘 − 1) d.f. even for
small 𝑛𝑖 's. However, a better approximation is provided by the Bartlett's test statistic:
−2log 𝜆′
𝜒2 =
1 1 1
1+ {∑ ( ) − }
3(𝑘 − 1) 𝑖 𝑛𝑖 ∑𝑖 𝑛𝑖
where 𝜆′ is obtained from 𝜆 on replacing 𝑛𝑖 by (𝑛𝑖 − 1) , which follows 𝜒 2 -distribution with
(𝑘 − 1) d.f. Thus the test statistic, under 𝐻0 is given by
𝑠2
∑𝑘𝑖=1 {(𝑛𝑖 − 1)log 𝑐 ( 2 )}
𝑠𝑖 2
𝜒2 = ∼ 𝜒𝑘−1
1 1 1
1+ {∑ ( ) − }
3(𝑘 − 1) 𝑖 𝑛𝑖 ∑ 𝑛𝑖
The critical region for test is, of course, the right-tail of the 𝜒 2 -distribution given by: 𝜒 2 >
𝜒 2 (𝑘 − 1)(𝛼),
Example 1. A sample of 400 students is found to have a mean height of 67.47 inches. Can it
be reasonably regarded as a sample from a large (or normal) population with mean heigh 67.39
inches and standard deviation 1.3 inches?
Solution. 𝐻0 : 𝜇 = 67.39 i.e., the sample is taken from a population against, with mean height
𝑥‾−𝜇
𝜇 = 67.39 inches, Test statistic: 𝑍 = 𝜎/ 𝑛 − 𝑁(0,1) under 𝐻0 . the value of the test statistic 𝑍
√
in given by
67.47 − 67.39
|𝑍| = = 1.23
(1.3)/√400
For 𝛼 = .05, the two-sided critical regions are given by |𝑍| > 1.96.
Conclusion:
Since the calculated value of |Z| under 𝐻0 is less than 1.96, the null hypothesis H0 is accepted.
Hence, we conclude that the sample could be regarded as drawn from a population with mean
67.39 inches and standard deviation 1.3 inches at 5% level of significance.
Alternative Method for Conclusion:
Since the difference 𝑥‾ ∼ 𝜇0 is less than 1.96 times the standard error of 𝑥‾, hence 𝐻1 is accepted.
Example 2 A random sample of 900 members is found to have a mean of 3.4 cm. Could it
come from a large population with mean 𝜇 = 3.25cms and 𝜎 = 2.16cms ?
294 | P a g e
295 | P a g e
Conclusion : Since Zcal does not fall in the critical region, H0 is accepted.
It implies that the company should switch to the new brand.
296 | P a g e
Question: 4
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is random sample from a population with density
𝜃−1
if 0<𝑥 < 1
𝑓(𝑥, 𝜃) = {𝜃𝑥
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is 100(1 − 𝛼)% confidence interval for 𝜃 ?
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
A. [ , ]
2 ∑𝑛
𝑖=1 ln 𝑋𝑖 2 ∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
B. [ , ]
−2 ∑𝑛
𝑖=1 ln 𝑋𝑖 −2 ∑𝑛
𝑖=1 ln 𝑋𝑖
2
C. 𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
[ , ]
−2 ∑𝑛
𝑖=1 ln 𝑋𝑖 −2 ∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
D. [2 ∑𝑛 , 𝑛 ]
𝑖=1 ln 𝑋𝑖 2 ∑𝑖=1 ln 𝑋𝑖
Question: 5
Let 𝑋1 , … , 𝑋𝑛 be a random sample from a 𝑁(2𝜃, 𝜃 2 ) population, 𝜃 > 0. A consistent
estimator for 𝜃 is
1
A. 𝑛 ∑𝑛𝑖=1 𝑋𝑖
5 1/2
B. (𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
1
C. 5𝑛 ∑𝑛𝑖=1 𝑋𝑖2
1 1/2
D. (5𝑛 ∑𝑛𝑖=1 𝑋𝑖2 )
Question: 6
What is the arithmetic mean of the data set: 4, 5, 0, 10, 8, and 3?
297 | P a g e
A. 4
B. 5
C. 6
D. 7
Question: 7
Which of the following cannot be the probability of an event?
A. 0.0
B. 0.3
C. 0.9
D. 1.2
Question: 8
If a random variable X has a normal distribution, then eX has a _____ distribution.
A. lognormal
B. exponential
C. poisson
D. binomial
Question: 9
What is the geometric mean of: 1, 2, 8, and 16?
A. 4
B. 5
C. 6
D. 7
Question: 10
Which test is applied to Analysis of Variance (ANOVA)?
A. t test
B. z test
C. F test
D. χ2 test
Question: 11
The arithmetic mean of all possible outcomes is known as
A. expected value
B. critical value
C. variance
D. standard deviation
298 | P a g e
Question: 12
Which of the following cannot be the value of a correlation coefficient?
A. –1
B. –0.75
C. 0
D. 1.2
Question: 13
Var (X) = ?
A. E[X2]
B. E[X2] – E[X]
C. E[X2] + E[X]2
D. E[X2] – E[X]2
Question: 14
Var (X + Y) = ?
A. E[X/Y] + E[Y]
B. E[Y/X] + E[X]
C. Var(X) + Var(Y) + 2 Cov(X, Y)
D. Var(X) + Var(Y) – 2 Cov(X, Y)
Question: 15
What is variance of the data set: 2, 10, 1, 9, and 3?
A. 15.5
B. 17.5
C. 5.5
D. 7.5
Question: 16
In a module, quiz contributes 10%, assignment 30%, and final exam contributes 60%
towards the final result. A student obtained 80% marks in quiz, 65% in assignment, and 75%
in the final exam. What are average marks?
A. 64.5%
B. 68.5%
C. 72.5%
D. 76.5%
299 | P a g e
Question: 17
In a university, average height of students is 165 cm. Now, consider the following Table,
Height 160-162 162-164 164-166 166-168 168-170
Students 16 20 24 20 16
What type of distribution is this?
A. Normal
B. Uniform
C. Poisson
D. Binomial
Question: 18
What is the average of 3%, 7%, 10%, and 16% ?
A. 8%
B. 9%
C. 10%
D. 11%
Question: 19
The error of rejecting the null hypothesis when it is true is known as
A. Type-I error
B. Type-II error
C. Type-III error
D. Type-IV error
Question: 20
The mean and variance of Poisson distribution with parameter lamda are both
A. 0
B. 1
C. λ
D. 1/λ
Question 21.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
Which of the following statements is(are) TRUE?
300 | P a g e
C) 2.27
D) 2.29
Question 25.
Let P be a probability function that assigns the same weight to each of the points of the sample
space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then which of
the following statement(s) is (are) TRUE?
1. E and F are independent
2. E and G are independent
3. E, F and G are independent
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. 1 and 2 only
D. 1,2 and 3
Question 26
Let 𝑋1 , 𝑋2 , … , 𝑋4 and 𝑌1 , 𝑌2 , … , 𝑌5 be two random samples of size 4 and 5 respectively, from a
51 𝑋 2 +𝑋 2 +𝑋 2 +𝑋 2
2 3 4
standard normal population. Define the statistic T = (4) 𝑌 2+𝑌 2 +𝑌 2 +𝑌 2 +𝑌 2 , then which of the
1 2 3 4 5
following is TRUE?
A. Expectation of 𝑇 is 0.6
B. Variance of T is 8.97
C. T has F-distribution with degree of freedom 5 and 4
D. T has F-distribution with degree of freedom 4 and 5
Question 27.
Let 𝑋, 𝑌 and 𝑍 be independent random variables with respective moment generating function
1 2
𝑀𝑋 (𝑡) = 1−𝑡 , 𝑡 < 1; 𝑀𝑌 (𝑡) = 𝑒 𝑡 /2 = 𝑀𝑍 (𝑡) 𝑡 ∈ ℝ. Let 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 then P(W > 2)
is equals to
A. 2𝑒 −1
B. 2𝑒 −2
C. 𝑒 −1
D. 𝑒 −2
302 | P a g e
Question 28.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
probability density function
1 1 𝑥 1 −𝑥
𝑓(𝑥 ∣ 𝜃) = [ 𝑒 − 𝜃 + 2 𝑒 𝜃2 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 ∈ (0, ∞)
3 𝜃 𝜃
Then the method of moment estimate (MME) of 𝜃 is
A. 1.5
B. 2.5
C. 3.5
D. 4.5
Question 29.
Let 𝑋 be a random variable with cumulative distribution function
1 𝑛+2𝑘+1
𝑃(𝑋 = ℎ, 𝑌 = 𝑘) = ( ) ; 𝑛 = −𝑘, −𝑘 + 1, … , ; 𝑘 = 1,2, …
2
Then E(Y) equals
A. 1
B. 2
C. 3
D. 4
Question 30.
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
Which of the following statements is (are) TRUE?
3
A. 𝑃(1 < 𝑋 < 2) = 10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
303 | P a g e
11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
(v) The importance of the method of minimum variance over other methods is that it
gives also then variance of T is …… (Ans. Lambda) [Ans. Variance]
In each of the following questions, four alternative answers are given in which only one is
correct. Select the correct answer and write the letter (a), (b) (c) or (d):
(i) The method of moments for determining point estimators of the population
parameters was discovered by
(a) Karl Pearson
(b) R.A. Fisher
(c) Cramer-Rao
(d) Rao-Blackwell
Ans. (a)
(ii) Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from
2
𝑓(𝑥, 𝛽) = (𝛽 − 𝑥), 𝛼 ≤ 𝑥 ≤ 𝛽.
𝛽2
The estimate of 𝛽 obtained by the method of moments is
(a) 𝑋‾
(b) 2𝑋‾
(c) 3𝑋‾
(d) 4𝑋‾
Ans. (c)
1
(iii) If 𝑓(𝑥, 𝜃) = 2 𝑒 −(𝑥−𝜃) , then the m.l.e. of 𝜃 is :
(a) sample mean
(b) sample mode
(c) sample median
(d) none of these
Ans. (c)
1
(iv) Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from p.d.f., 𝑓(𝑥, 𝜃) = 𝜃 , 0 < 𝑥 < 𝜃. The m.le.
for 𝜃 is
(a) Min (𝑋𝑖 )
(b) Max (𝑋𝑝 )
1
(c) 𝑛 Σ𝑋𝑖
(d) Σ𝑋𝑖
Ans. (b)
305 | P a g e
306 | P a g e
307 | P a g e
(xiv) A minimum variance unbiased estimator is said to be unique if for any other
estimator 𝑇𝑛′
(a) Var (𝑇𝑛 ) = Var (𝑇𝑛 )
(b) Var (𝑇𝑛 ) ≤ Var (𝑇𝑛 )
(c) both (a) and (b)
(d) neither (a) nor (b)
Ans. (a)
10.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
10.6 GLOSSARY
• Motivation: These Problems are very useful in real life and we can use it in data science,
economics as well as social sciemce.
• Attention: Think how the Methods of Estimation are useful in real world problems.
10.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1: A
Explanation:
2 2
Since 𝑋 2 + 𝑌 2 ∼ 𝜒(2) , we know that 𝜒(2) random variable is the same as that of the
exponential random variable with mean 2 and therefore, we have
4 1 −𝑡/2
𝑃(0 < 𝑋 2 + 𝑌 2 < 4) = ∫0 𝑒 𝑑𝑡 = 1 − 𝑒 −2
2
Hence option A is the correct choice.
Answer 2: B
Explanation:
1 (𝑋𝑖 − 𝑋‾)2
= 𝑃(𝑊 ≤ 1.24) = 𝑃 (∑𝑛𝑖=1 2
≤ 1.24) = 𝑃(𝜒𝑛−1 ≤ 1.24)
100 𝜎2
Thus from the given value 𝑃(𝜒72 ≤ 1.24) = 0.01
we get
n−1=7
and hence the sample size n is 8 .
Answer 3: B
Explanation:
308 | P a g e
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
5
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛𝑥‾𝑦‾ 400 − 20 × 5 × 2 150 3 5 3 10
𝛽ˆ = 𝑖 2
= 2
= = ; 𝛼ˆ = 𝑦‾ − 𝑥‾𝛽ˆ = − 5 × = −
∑𝑖=1 𝑥𝑖 − 𝑛𝑥‾ 2 600 − 20 × 5 100 2 2 2 2
= −5
Hence option B is correct.
Answer 4: C
Explanation:
We use the random variable 𝑄 = −2𝜃∑𝑛𝑖=1 ln 𝑋𝑖 ∼ 𝜒(2𝑛)
2
309 | P a g e
Answer 7: D
Explanation :
The probability of an event is always between 0 and 1 (including 0 and 1 ). So, 1.2 cannot be
the probability of an event.
Answer 8: A
Explanation :
A lognormal distribution is a probability distribution of a random variable whose logarithm is
normally distributed. So, if X is lognormal then Y = ln (X) is normal. Similarly, if Y is
normal then X = eY is lognormal.
Answer 9: A
Explanation :
There are 4 numbers in total. So, by using the formula for calculating geometric mean, we
have
𝐺. 𝑀 = (1 × 2 × 8 × 16)1/4
= (256)1/4
= (44 )1/4 ∵ 44 = 256
=4
Answer 10: C
In anova we use F test because we use ratio of two chi square statistics .
Answer 11: A
Explanation :
Expectation (or expected value) is the arithmetic mean of all possible outcomes of a random
variable.
Answer 12: D
Explanation :
The value of a correlation coefficient is always between −1 and 1, including −1 and 1.
310 | P a g e
Answer 13: D
Explanation :
By definition, Var (𝑋) = 𝐸[𝑋 2 ] − 𝐸[𝑋]2
Answer 14: C
Explanation :
By definition, Var (𝑋 + 𝑌) = Var (𝑋) + Var (𝑌) + 2Cov (𝑋, 𝑌)
Answer 15: B
Explanation :
First calculate the mean
2 + 10 + 1 + 9 + 3 25
Mean = = =5
5 5
Now calculate the variance,
(2 − 5)2 + (10 − 5)2 + (1 − 5)2 + (9 − 5)2 + (3 − 5)2
Variance =
5
= 17.5
Answer 16: C
Explanation :
By using formula for calculating weighted average, we have
Weighted Average = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3
= 0.1 × 0.8 + 0.3 × 0.65 + 0.6 × 0.75
= 0.725 = 72.5%
Answer 17: A
Explanation :
311 | P a g e
Answer 18: B
Explanation :
3%, 7%, 10%, 16%
3 + 7 + 10 + 16
Average = %
4
36
= % = 9%
4
Answer 19: A
Explanation :
Type I error = P(reject H0| when H0 is true )
Answer 20: C
Explanation :
We know that if random variable x follows Poisson distribution with parameter lamda then
E(X) = V(X)= lamda
Answer 21: A
Explanation :
The marginal probability mass function of X is given by
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
1 1 𝑚
𝑒 − 2(2)
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞
𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
𝑒 −1
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
312 | P a g e
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 22: 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
2 2
= 2 × ̅6 (1 − 6) = 16/36
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
Answer 23: 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
313 | P a g e
Answer 24: 𝐀
Explanation :
1
X ∼ 𝐹(𝑚, 𝑛) then x ∼ 𝐹(𝑛, 𝑚)
𝑃[𝑈 > 3.69] = 0.05 ⇒ 1 − 𝑃[𝑈 > 3.69] = 1 − 0.05
⇒ 𝑃[𝑈 < 3.69] = 0.95
1 1 1
⇒ 𝑃 [𝑈 > 3.69] = 0.95 ⇒ 𝑉 = 𝑈 and
1
𝑐= = 0.27
3.69
Hence c = 0.27 is the correct answer.
Answer 25: C
Explanation :
Clearly, P({𝜔}) = 1/4 ∀𝜔 ∈ Ω = {1,2,3,4}. We have E = {1,2}, F = {1,3} and G = {3,4}
Then P(E) = P(F) = P(G) = 2/4 = 1/2.
Using this result, we see that E and F are independent and also E and G are independent.
Hence option C is correct.
Answer 26: D
Explanation :
5 𝑋12 + 𝑋22 + 𝑋32 + 𝑋42 𝑛 5
𝑇=( ) 2 2 2 2 2 ∼ 𝐹(4,5); 𝐸(𝑊) = =
4 𝑌1 + 𝑌2 + 𝑌3 + 𝑌4 + 𝑌5 𝑛−2 3
2(5)2 (7) 350
Var (𝑇) = = = 9.72
4(3)2 (1) 36
Hence option D is correct.
Answer 27: A
Explanation :
2
Since 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 ∼ 𝜒(4)
1 −𝑤/2
𝑓𝑊 (𝑤) = {4 𝑤𝑒 , if 𝑤 > 0
0, otherwise
314 | P a g e
1 −𝑤/2
∞
𝑃(𝑊 > 2) = ∫2 𝑤𝑒 𝑑𝑤 = 2𝑒 −1
4
Hence option A is correct.
Answer 28: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ (0, ∞) then 𝜃 = 2.5
Hence option C is correct.
Answer 29: B
Explanation :
𝑃(𝑌 = 𝑘) = ∑∞
𝑛=−𝑘 𝑃(𝑋 = 𝑛, 𝑌 = 𝑘): { put m = n + k}
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
1 1 𝑘−1
𝐸(𝑌) = ∑∞
𝑘=0 𝑘 ( ) =2
2 2
Hence option B is correct.
Answer 30: A
Explanation :
3
𝑃(1 < 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) =
10
3
𝑃(1 < 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) =
5
1
𝑃(1 ≤ 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) =
2
4
𝑃(1 ≤ 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) + 𝑃(𝑋 = 1) =
5
315 | P a g e
10.8 REFERENCES
• Devore, J. (2012). Probability and statistics for engineers, 8th ed. Cengage Learning.
• John A. Rice (2007). Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Larsen, R., Marx, M. (2011). An introduction to mathematical statistics and its
applications. Prentice Hall.
• Miller, I., Miller, M. (2017). J. Freund’s mathematical statistics with applications, 8th
ed. Pearson.
• Demetri Kantarelis, D. and Malcolm O. Asadoorian, M. O. (2009). Essentials of
Inferential Statistics, 5th edition, University Press of America.
• Hogg, R., Tanis, E., Zimmerman, D. (2021) Probability and Statistical inference,
10TH Edition, Pearson
10.9 SUGGESTED READINGS
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
316 | P a g e
9 788119 169573