Review
Measures of central tendency:
The single value, which represents the group of values, is termed as a
“measure of central tendency” or a measure of location or an average.
Types of average:1. Arithmetic Mean
2. Median
3. Mode
4. Geometric Mean
5. Harmonic Mean
Arithmetic Mean (A.M): It is defined as the sum of the given observations divided
by the number of observations. A.M. is measured with the same units as that of the
observations.
Let x1, x2 , ………,xn be ‘n’ observations then the A.M is computed from the formula:
A.M.= where = sum of the given observations
n = Number of observations
Median : The median is the middle most item that divides the distribution into two
equal parts when the items are arranged in ascending order of magnitude.
If the number of observations is odd, then median is the middle value after
the values have been arranged in ascending or descending order of magnitude. In
case of even number of observations, there are two middle terms and median is
obtained by taking the arithmetic mean of the middle terms.
Mode: Mode is the value which occurs most frequently in a set of observations or
mode is the value of the variable which is predominant in the series.
Measures of Dispersion:
Dispersion means scattering of the observations among themselves or from a
central value (Mean/ Median/ Mode) of data. We study the dispersion to have an
idea about the variation.
Suppose that we have the distribution of the yields (kg per plot) of two Ground nut
varieties from 5 plots each. The distribution may be as follows:
Variety 1: 46 4850 5254
Variety 2: 30 40 50 60 70
It can be seen that the mean yield for both varieties is 50 kg. But we can not say
that the performances of the two varieties are same. There is greater uniformity of
yields in the first variety where as there is more variability in the yields of the
second variety. The first variety may be preferred since it is more consistent in yield
performance.
Types of dispersion:
1. Range
2. Quartile Deviation
3. Mean Deviation
4. Standard Deviation and Variance
5. Coefficient of Variation
6. Standard Error
Range: It is the difference between maximum value and minimum value.
Standard Deviation (: It is defined as the positive square root of the
arithmetic mean of the squares of the deviations of the given values from
arithmetic mean. The square of the standard deviation is called variance.
Let x1, x2 , …….,xn be n observations then the standard deviation is given by the
formula
S.D. = where A.M. = ,
where n = no. of observations.
Simplifying the above formula, we have
or S.D () =
Example:
Calculate S.D. for the values 5, 6, 7, 7, 9, 4, 5.
S.D.=
= 1.55 kg.
Coefficient of Variation (C.V):
Coefficient of variation is the percentage ratio of standard deviation and the
arithmetic mean. It is usually expressed in percentage. The formula for C.V. is,
C.V. = x10 0
The coefficient of variation will be small if the variation is small of the two
groups, the one with less C.V. said to be more consistent.
Note: 1. Standard deviation is absolute measure of dispersion
2. Coefficient of variation is relative measure of dispersion.
NORMAL DISTRIBUTION
The Normal Distribution (N.D.) was first discovered by De- Moivre as the limiting
form of the binomial model in 1733, later independently worked by Laplace and
Gauss.
The Normal distribution is ‘probably’ the most important distribution in statistics. It
is a probability distribution of a continuous random variable and is often used to
model the distribution of discrete random variable as well as the distribution of
other continuous random variables. The basic form of normal distribution is that of
a bell, it has single mode and is symmetric about its central values.
Definition: A random variable X is said to follow a Normal Distribution with
2
parameter and and if its density function is given by the probability law
f(x) = - < x< ; - < < ; > 0
where = a mathematical constant equality = 22/7
e = Naperian base equaling 2.7183
= population mean
= population standard deviation
x = a given value of the random variable in the range - < x <
Characteristics of Normal distribution and normal curve:
i. The curve is bell shaped and symmetrical, about the mean
ii. The height of normal curve is at its maximum at the mean. Hence the
mean and mode of normal distribution coincides. Also the number of
observations below the mean in a normal distribution is equal to the
number of observations about the mean. Hence mean and median of
N.D. coincides. Thus, N.D. has Mean = median = mode
iii. As ‘x’ increases numerically, f(x) decreases rapidly, the maximum
probability occurring at the point x = , and given by
p[(x)] max =
the area under the normal curve is distributed as follows
i) - < x < + covers 68.26% of total area (or) 0.6826
ii)- 2 < x < +2 covers 95.44% of total area (or) 0.9544
iii) - 3 < x < +3 coves 99.73% of total area (or) 0.9973
Standard Normal Distribution: If ‘X’ is a normal random variable with Mean and
standard deviation , then Z = is a standard normal variate with zero mean
and standard deviation = 1.
The probability density function of standard normal variate ‘z’ is
f(z) = and =1
A graph representing the density function of the Normal probability distribution
is also known as a Normal Curve or a Bell Curve (see Figure below). To draw
such a curve, one needs to specify two parameters, the mean and the standard
deviation. The graph below has a mean of zero and a standard deviation of 1, i.e.,
(m =0, s =1). A Normal distribution
with a mean of zero and a standard deviation of 1 is also known as the Standard
Normal Distribution.
Standard Normal Distribution
Testing of Hypothesis
Introduction: The estimate based on sample values do not equal to the true value
in the population due to inherent variation in the population. The samples drawn
will have different estimates compared to the true value. It has to be verified that
whether the difference between the sample estimate and the population value is
due to sampling fluctuation or real difference. If the difference is due to sampling
fluctuation only it can be safely said that the sample belongs to the population
under question and if the difference is real we have every reason to believe that
sample may not belong to the population under question. The following are a few
technical terms in this context.
Hypothesis: The assumption made about any unknown characteristics is called
hypothesis. It may or may be true.
Ex: 1. = 2.3; be the population mean
2. = 2.1 ; be the population standard deviation
Population follows Normal Distribution. There are two types of hypothesis, namely
null hypothesis and alternative hypothesis.
Null Hypothesis: Null hypothesis is the statement about the parameters. Such a
hypothesis, which is usually a hypothesis of no difference is called null hypothesis
and is usually denoted by H 0. (or) any statistical hypothesis under test is called null
hypothesis. It is denoted by H 0 .
1. H 0 : = 0
2.
H0: 1 = 2
Alternative Hypothesis: Any hypothesis, which is complementary to the null
hypothesis, is called an alternative hypothesis, usually denoted by H1.
Ex: 1. H 1: ≠ 0
2. H 1: 1 ≠ 2
Population: In a statistical investigation the interest usually lies in the assessment
of the general magnitude and the study of variation with respect to one or more
characteristics relating to objects belonging to a group. This group of objects
under study is called population or universe. i.e the totality of all the objects under
study is called Population.
Sample: A finite subset of statistical objects in a population is called a sample and
the number of objects in a sample is called the sample size.
Parameter: A characteristics of population values is known as parameter. For
2
example, population mean () and population variance ( ).
In practice, if parameter values are not known and the estimates based on the
sample values are generally used.
Statistic: A Characteristics of sample values is called a statistic. For example,
2
sample mean ( ), sample variance (s ) where =
2
and s =
Sampling distribution: The distribution of a statistic computed from all possible
samples is known as sampling distribution of that statistic.
Standard error: The standard deviation of the sampling distribution of a statistic is
known as its standard error, abbreviated as S.E.
S.E.( )= ; where = population standard deviation and n = sample size
Random sampling: If the sampling units in a population are drawn independently
with equal chance, to be included in the sample then the sampling will be called
random sampling. Simple Hypothesis: A hypothesis is said to be simple if it
completely specifies the distribution of the population. For instance, in case of
normal population with mean and standard deviation , a simple null hypothesis
is of the form H0 : = , is known, knowledge about would be enough to
understand the entire distribution.
Composite Hypothesis: If the hypothesis does not specify the distribution of the
population completely, it is said to be a composite hypothesis. Following are some
examples:
H0 : and is known
H0 : and is known
Types of Errors:
In testing of statistical hypothesis, there are four possible types of decisions
1. Rejecting H 0 when H 0 is true
2. Rejecting H 0 when H 0 is false
3. Accepting H0 when H 0 is true
4. Accepting H0 when H 0 is false
th
1 and 4 possibilities leads to error decisions. Statistician gives specific names
to these concepts namely Type- I error and Type- II error respectively.
The above decisions can be arranged in the following table
H 0 is true H 0 is false
Rejecting H 0 Type- I error Correct
(Wrong decision)
Accepting H0 Correct Type- II error
Type- I error: Rejecting H 0 when H 0 is true
Type- II error: Accepting H 0 when H 0 is false
The probabilities of type- I and type- II errors are denoted by and respectively.
Degrees of freedom: It is defined as the difference between the total number of
items and the total number of constraints.
If ‘n’ is the total number of items and ‘k’ the total number of constraints then the
degrees of freedom (d.f.) is given by d.f. = n- k
Level of significance(LOS): The maximum probability at which we would be willing
to risk a type- I error is known as level of significance or the size of Type- I error is
level of significance. The level of significance usually employed in testing of
hypothesis are 5% and 1% . The Level of significance is always fixed in advance
before collecting the sample information. LOS 5% means the results obtained will
be true is 95% out of 10 0 cases and the results may be wrong is 5 out of 10 0
cases.
Critical value: while testing for the difference between the means of two
populations, our concern is whether the observed difference is too large to believe
that it has occurred just by chance. But then the question is how much difference
should be treated as too large? Based on sampling distribution of the means, it is
possible to define a cut- off or threshold value such that if the difference exceeds
this value, we say that it is not an occurrence by chance and hence there is
sufficient evidence to claim that the means are different. Such a value is called the
critical value and it is based on the level of significance.
Steps involved in test of hypothesis:
1. The null and alternative hypothesis will be formulated
2. Test statistic will be constructed
3. Level of significance will be fixed
4. The table (critical) values will be found out from the tables for a given level
of significance
5. The null hypothesis will be rejected at the given level of significance if the
value of test statistic is greater than or equal to the critical value.
Otherwise null hypothesis will be accepted.
6. In the case of rejection the variation in the estimates will be called
‘significant’
variation. In the case of acceptance the variation in the estimates will be
called ‘not- significant’.
*****