0% found this document useful (0 votes)
24 views48 pages

Statistics Notes Final

The document provides an overview of statistical tests, including parametric tests like z-tests, t-tests, and ANOVA, which rely on population parameters, and non-parametric tests such as Chi-Square and Mann-Whitney tests that do not require such assumptions. It also discusses hypothesis types, errors in hypothesis testing, correlation and regression analysis, normal probability curves, and probability sampling methods. Each section outlines key concepts, applications, and examples relevant to statistical analysis.

Uploaded by

rituthube15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views48 pages

Statistics Notes Final

The document provides an overview of statistical tests, including parametric tests like z-tests, t-tests, and ANOVA, which rely on population parameters, and non-parametric tests such as Chi-Square and Mann-Whitney tests that do not require such assumptions. It also discusses hypothesis types, errors in hypothesis testing, correlation and regression analysis, normal probability curves, and probability sampling methods. Each section outlines key concepts, applications, and examples relevant to statistical analysis.

Uploaded by

rituthube15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

1)Parametric Tests:

Parametric tests are those that make assumptions about the


parameters of the population distribution from which the sample is
drawn.

Parametric Tests are useful as these tests are more powerful for testing
the significance of computed sample statistic.

If the information about the population is completely known by means


of its parameters then statistical test is called Parametric Tests.

Null hypothesis is made on parameters of the population distribution.


Parametric Tests can applicable only for variables.

Assumptions for parametric tests:

1. Normality – Data in each group should be normally distributed.


2. Equal Variance – Data in each group should have approximately
equal variance.
3. Independence – Data in each group should be randomly and
independently sampled from the population.
4. No Outliers – There should be no extreme outliers.

Parametric Tests are:

1. z test
2. t- test
3. ANOVA
1.Z test:

If the sample size more than 30, it’s called large sample z test/ Normal
test.

𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛−𝑚𝑒𝑎𝑛 𝑥−𝑥
Z= =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝜎

If the distance in terms of S.E./ Z score falls within mean+S.E. then H0 is


accepted.

The greater the Z value, the smaller will be the P.

Z test has two applications:

To test the significance of difference between a sample mean and


known value of the population.

To test the significance of difference between two sample means.

2. t test:

If the sample size less than 30 then t test is used.

Types of t test:

1. Single sample t – we have only 1 group; want to test against a


hypothetical mean.
[Link] samples t – we have 2 means, 2 groups; no relation
between groups, e.g., people randomly assigned to a single group.

The two-sample t-test is a method used to test whether the unknown


population means of two groups are equal or not.
Our null hypothesis is that the underlying population means are the
same.

Ho:μ1=μ2
[Link] (paired) t – we have two means. Either same people in
both groups, or people are related, e.g., husband-wife, left hand-right
hand, hospital patient and visitor.

Applications of t test:

[Link] compare the effect of two drugs given to the same patient in
sample on two different occasions . Like - number of Hours for which
sleep induced by two Hypnotics.
2. To study the comparative accuracy of two different instruments.
[Link] compare the results of two different laboratory techniques .

[Link]:

Analysis of Variance (ANOVA) is a method for testing the hypothesis


that there is no difference between two or more population means.

When there are more than two means, it is possible to compare each
mean to each other mean using t test.

However conducting multiple t tests can lead to severe inflation of type


I error.

ANOVA can be used to test differences among several mean for


significance without increasing type I error.

This test involves distribution called F- distribution.


e.g. A group of psychiatric patients are trying three different
therapies: counseling, medication and biofeedback. You want to see
if one therapy is better than the others.

****************************************************

2)Non parametric Tests:


If we do not have any knowledge about population or parameter and
still we want to test hypothesis, we use non parametric test.
The test is mainly based on differences in medians.
There may be situations where we cannot meet assumptions and
conditions and thus cannot use parametric statistical procedures.
In such situation we are bound to apply non- parametric statistics.
Deals with small sample size.
Non- parametric tests are assumption free.
Non- parametric tests are user friendly compared with parametric tests
and economical in time.

Non parametric Tests are:

1. Chi Square test

2. Mann Whitney test

3. Wilcoxon signed rank test

4. Kruskal wallis test

5. Fisher’s exact test

6. McNemar’s test
1. Chi suare test:

Chi square test plays important role in the problem where information
is obtained by counting instead of measuring.

Applications of Chi square test are:

a) Proportion:

It is very useful test which can be applied to find significance in same


type of data with two or more advantages.
Ex. Incidence of diabetes in 20 non obese patients.
Incidence of diabetes in 20 obese patients.

b) Goodness of fit:

This test is applied to determine that the actual numbers are similar to
expected or theoretical numbers.
The goodness of fit test determines whether the data fit a particular
distribution or not.

c) Association:

The Chi-Square Test for Association is used to determine if there is any


association between two discrete attributes.
Two variables can often be studied for their association such as :
smoking and cancer
Treatment and outcome of disease
Vaccination and immunity
2. Mann Whitney test:

The Mann-Whitney U test is a non-parametric test for assessing


whether two samples of observations come from the same distribution.

It requires the two samples to be, independent and the observations to


be ordinal or continuous measurements.
This test says that null hypothesis tested in a sample is symmetrically
distributed around a specified center.

An advantage with this test is that the two samples under consideration
may not necessarily have the same number of observations.

[Link] signed rank test:

The Wilcoxon sign test uses ranked or ordinal data; thus, it is a


common alternative to the dependent samples t-test (Paired t
test)when its assumptions are not met.

It determines whether the before and after data on same patient from
a sample is obtained is from same distribution or not.
As this test uses both the rank i.e. sums and signs of paired difference,
that’s why it is said to be more efficient.

[Link] wallis test:

Kruskal-Wallis one way analysis of variance by ranks is a non-parametric


method for testing whether samples originate from the same
distribution.
It is used to compare more than two samples that are independent.

If this test has significant results, then at least one of the sample is
different from the other samples, but if doesn’t identify where the
difference occurs.

[Link]’s exact test:

This is a statistical significance test used in the analysis of contingency


tables.

It is employed in small samples.

It is used to examine the significance of the association between the


two kinds of classification.

It is used to examine the significance of the association between the


two kinds of classification.

[Link]’s test:

McNemar’s test is a normal approximation used on nominal data.

It is applied to 2x2 contigency tables.

This test applied when one has to test the difference between paired
proportions.

e.g. studies where patients serve as their own control or in studies


before and after design.

**********************************************
3)Types of Hypothesis:
A hypothesis is an assumption about the population parameter which is
to be tested.

For that we collect sample data, then we calculate sample statistic and
then use this information to decide whether hypothesized value of
population parameter is correct or not.

There are two types of hypothesis:

[Link] hypothesis (H0) :

We start presuming that there is no difference between true values and


sample values.

Null hypothesis states that, there is no significant difference between


before and after treatment or between two groups.

e.g. Pushkarmool is not effective in Shwas.

So at the end, if hypothesis is rejected, it indicate that there is a


significant difference between data of two groups; and if it is accepted,
it mean there is no sufficient evidence to regret the hypothesis and
difference is non- significance.

It is set for possible rejection.

2. Alternative Hypothesis (H1):

It is opposite to the null hypothesis.


It also never contains the equal to sign.
The alternative statement must be true if the null hypothesis is false.

It is called as research hypothesis.

There is the significance difference between before and after treatment


or between two groups

e.g. Pushkarmool has significant effect in Shwas.

****************************************************

4)Type I and Type II error:


Even in the best research project, there is always a possibility that the
researcher will make a mistake regarding the relationship between the
two variables.
There are two possible mistakes or error:
Type I error
Type II error

[Link] I error (α):

When true null hypothesis H0 is rejected, it causes type I error.


It is not affected by sample size.
A Type I error happens when you get false positive results: you
conclude that the drug intervention improved symptoms when it
actually didn’t.
Usually set at 0.05.

[Link] II error (β):


When false null hypothesis is accepted it causes type II error.
It gets smaller as sample size get large.
A Type II error happens when you get false negative results: you
conclude that the drug intervention didn’t improve symptoms when it
actually did.
Conventional accepted as are 0.1 to 0.2.

Example of Type I and Type II error:

You decide to get tested for COVID-19 based on mild symptoms. There
are two errors that could potentially occur:
Type I error (false positive): the test result says you have corona virus,
but you actually don’t.
Type II error (false negative): the test result says you don’t have corona
virus, but you actually do.

Generally, reducing the possibility of committing a Type I error


increases the possibility of committing a Type II error and vice versa.

In Medical Statistics, Type I error is more serious.

Researchers generally try to minimize Type I error.


******************************************************

5)Correlation:
• Correlation is a statistical technique used to determine degree to
which two variables are related.
• Correlation analysis is used to estimate the strength of a linear
relationship between two variables.
• We may be interested in studying the relationship between age
and blood pressure, height and weight.
• The correlation coefficient r lies between -1 to +1.

Types of correlation:

1. Positive correlation

2. Negative correlation

3. Zero/ No correlation

[Link] correlation:

• A positive correlation is a relationship between two variables in


which both variables move in the same direction.
• Therefore, when one variable increases as the other variable
increases, or one variable decreases while the other decreases.
• An example of positive correlation would be height and weight.
Taller people tend to be heavier.
• Example: Speed and distance

[Link] correlation:

• A negative correlation is a relationship between two variables in


which both variables move in the opposite direction.
• A negative correlation is a relationship between two variables in
which an increase in one variable is associated with a decrease in
the other.
• An example of negative correlation would be height above sea
level and temperature. As you climb the mountain (increase in
height) it gets colder (decrease in temperature).
• Example: age and bone density.

[Link]/No correlation:

• A zero correlation exists when there is no relationship between


two variables.
• For example there is no relationship between the amount of tea
drunk and level of intelligence.

Interpretation of Correlation:
*****************************************************

6)Regression:

Sometimes there are situations, when it is necessary to estimate / predict


the value of one character of the knowledge of other character. This is
possible when two are linearly correlated variables.

Correlation gives the degree and direction of the relationship between the
two variables, whereas the regression analysis enables us to predict the
value of one variable on the basis of other variable.

Regression technique is concerned with predicting some variable by


knowing others.

Regression is mathematical measure of average relationship between two


or more variables.
The regression line has the general formula:
y = a + bx.
Where “a” and “b” are two constants denoting the intercept of the line
on the Y-axis (y-intercept) and the slope of the line, respectively.

Example: Medical researchers often use linear regression to understand the


relationship between drug dosage and blood pressure of patients.

Uses of Regression analysis:


1. Regression analysis is used for prediction and forecasting.
2. Medicine- Forecast the different combinations of medicines to
prepare generic medicines for diseases.
3. Linear regression is used to study the linear relationship between a
dependent variable Y (blood pressure) and one or more independent
variables X (age, weight, sex).

*********************************************************

7)Normal Probability Curve:

1. It is bell shaped smooth curve.


2. It has two tails & it is symmetrical.
3. It does not touch the base line.
4. Mean, median, mode coincide and they are zero.
i. e. Mean= median= mode= 0.
5. Standard deviation is 1.
6. The central part is convex and points of inflection there is convexity.
7. 68% observations are included in the range of mean + 1 S.D.
8. 95 % observations are included in the range of mean + 2 S.D.
9. 99% observations are included in the range of mean + 3 S.D.
10. No portion of the curve lies below the X- axis.
Example of Normal distribution:

Birth Weight: The normal birth weight of a newborn range from 2.5 to 3.5 kg. The
majority of newborns have normal birth weight whereas only a few percentage of
newborns have a weight higher or lower than the normal. Hence, birth weight
also follows the normal distribution curve.
********************************************************

8)Probability Sampling:
In this type, each individual unit in the population has the equal chance of being
selected. For example, in a population of 1000 members, every member will have
a 1/1000 chance of being selected to be a part of a sample. Probability sampling
eliminates sampling bias in the population and gives all members a fair chance to
be included in the sample.

There are main six types of probability sampling:

1. Simple random sampling


2. Systematic sampling
3. Stratified sampling
4. Cluster sampling
5. Multi phase sampling
6. Multi Stage sampling

1. Simple random sampling:

This method is used in experimental medicine / clinical trials to test the


efficacy of drugs. Here the population should be small and homogeneous.
Lottery method or random number table method is applied to draw a sample.
The sample is highly representative if all subjects participate. But this
technique is not possible if all the members in the population are not involved;
uneconomical to achieve.

Among all the probability sampling procedures simple random sampling is the
most basic and least complicated.

Example: A survey is conducted in a company of 100 employees for determining


their satisfaction level. 20 of them are selected in random.

2. Systematic sampling:
This method is popularly used in those cases where a complete list of
population from which sample has to be drawn is [Link]: If we
want to select 100 items from a universe of 1000, calculate k.
K=N/n= 1000/100= 10Here starting number is selected by simple random
starting. If we get 6 as starting number, then every 10th item from 6th on
wards has to be taken. Example: 6,16,26,36,46,…….986,996. etc.

3. Stratified sampling:
If the population is not homogeneous, then stratified random sampling
method is employed.
When the population is divided into different strata's or groups and then
samples are selected from each stratum by simple random sampling
procedure we called it as stratified random sampling. This method is
typically used when a population has distinct differences, such as
demographics, level of education, or age can easily be broken into
subgroups.
For example,
a. one might divide a sample of adults into subgroups by age, like 18–29,
30–39, 40–49, 50–59, and 60 and above.
b. Sociological- Religion wise- Hindu, Muslim, Christian, Sikh, Buddhists etc.
4. Cluster sampling:
The whole population is divided in small clusters it may be according to
location. Then clusters are selected in sample.
In cluster sampling, researchers divide a population into smaller groups
known as clusters. They then randomly select among these clusters to form
a sample. Cluster sampling is a method of probability sampling that is often
used to study large populations, particularly those that are widely
geographically dispersed.
Example: Villages, wards, slums of towns, factories, school children etc.

Example 2:To know what students think about the school’s administration,
the researcher chooses specific classes to provide feedback. All the
students in the selected classes have the opportunity to share their views
on the school’s administrative process.

5. Multi phase sampling:


Here part of information is collected from the whole sample and part from
the sub- sample.
The advantage of this method is that it reduce the workload of investigator
and there is no need for a sampling frame showing all individuals in the
population.

Example: Tuberculosis Survey Here simple and cheap tests like Mantoux
test are done to all cases of sample- First phase.
Those who are positive for Mantoux test are screened by X- ray chest (or
MMR) which is more expensive than the first test- Second phase.
Those who are positive for X- ray chest and clinical symptoms, their
sputum examination is done - Third phase.

6. Multi Stage sampling:


Under this method, the random selection is made of primary, intermediate,
final units from a given population. Thus, the area of investigation is
scientifically restricted to a small number of ultimate units, which are
representative of whole.
Example: If we want to study the nutritional status of India, we will do the
following way:
Primarily, we select 4 states randomly from our regions of India (Primary
stage).

Later, we select 4 districts from each state (intermediate stage).

Later, we select 4 villages randomly from each 4 districts (final stage).

So, finally we select 64 villages as representative of entire India.


************************************************************

9)Laws of Probability:
Definition of Probability:

Probability may be defined as the relative frequency or probable


chances of occurrence with which an event is expected to occur on an
average.

• Usually expressed as symbol ‘p’.


• ‘p’ ranges from 0 to 1.
• P=0 means ‘no chance of an event happening.’
• P= 1 means ‘100% chance of an event happening’.
• If probability of event happening is ‘p’ and probability of not
happening is q then
P+q=1
The laws of probability are:

i) Addition law of probability


ii) Multiplication law of probability
iii) Binomial law of probability

i) Addition law of probability:

The total probability of two mutually exclusive events follows the


addition law of probability as the total probability is equal to the sum of
individual probabilities.

If events (A) and (B) are mutually exclusive events then

P(A or B) =P(AUB)=P(A) + P(B)

e.g. When a single dice is thrown then

P(2)= 1/6

P(5)=1/6

P(2 or 5)= P(2) +P(5)

=1/6 + 1/6

=2/6 = 1/3

ii) Multiplication law of probability:


If two or more independent events are occurring together, then the
sequence of two events is calculated by multiplication law of
probability.

P(A and B)=P(AՈB)= P(A) x P(B)

Example: Probability that a patient have blood group ‘O’ is 1/10 and
probability of a person have HIV positive is 1/10. What is the probability
that a person having blood group ‘O’ is HIV positive.

Let A is event that a person is HIV positive,


P(A)=1/10
Let B is event that a person have blood group ‘O’
P(B)=1/10

Let AՈB be the event that a patient is HIV positive AND has a blood
group ‘O’.

Note that event A and B are independent

Therefore P(AՈB)= P(A) x P(B)


= 1/10 x 1/10
=1/100

iii) Binomial law of probability:


In any trial, if there are two possibilities- either ‘success’ of ‘failure’
such distribution is called as binomial distribution.
Ex. 1. Male or female
2. Passed or failed
3. Present or absent

Binomial law of probability is used when two events are occurring one
after the other.

e.g. When two children born after the other, the possible sequences
will be any of the following four:
1. 1st Male & 2nd Male= ½ x ½= ¼

2. 1st Male & 2nd Female= ½ x ½= ¼

3. 1st Female & 2nd Male= ½ x ½= ¼

4. 1st Female & 2nd Female= ½ x ½= ¼

Therefore,

Chances of getting two males = ¼= 25%

Chances of getting two females =1/4 = 25%

Chance of getting one either of the sex = ¼+ ¼ = ½ = 50%


Binomial law of probability distribution formed by the terms of the
expansion of binomial expression

(p+q)n
where, n= no. of events
p= probability of success
q= probability of failure

******************************************************

10)Standard Deviation(S.D.):
Definition:

It is the square root of summation of square of deviation of given set of


observations from the arithmetic mean divided by the total number of
observations.
∑(𝑥−𝑥) 2
σ= √
𝑛
It is most useful and best method of calculating deviation.
A large S.D. shows that the measurements of the frequency distribution
are widely spread out from the mean.
Small S.D. means the observations are closely spread in the
neighbourhood of mean.

Computation of S.D.:
• Calculate the mean

• Find the difference of each observations from the mean

• Square the difference of observation from the mean

• Add the squared values to get the sum of squares of the deviation

• Divide the sum by number of observations minus one to get mean


squared deviation, called Variance

• Find the square root of this variance to get standard deviation.

Example: No. Of students in 5 classrooms are 30,40,50,60,70.


Calculate standard deviation.

Solution:
Step I – Calculate mean
∑𝑥
𝑥= 𝑛
30+40+50+60+70
= 5

250
=
5

= 50
Step II – Calculate X - 𝑥 :
30-50=-20
40-50=-10
50-50=0
60-50=10
70-50=20.

Step III – Calculate (x- 𝑥)2 :

-20x-20=400
-10x-10=100
0x0=0
10x10=100
20x20=400

∑(𝑥−𝑥) 2=1000

Step IV- Calculate Standard deviation:

σ=

=14.1421
Uses of Standard Deviation:

2. It summaries the deviation of a large distribution from the


mean into one indicative figure.
3. It indicates whether the variation of difference of
observations from
the mean is by chance/ real.
4. It is used to find standard error; this determines whether the
differences between means of two similar samples are by
chance/ real.
5. It helps in determining the suitable size of sample for valid
calculations.

*****************************************************

11)Standard Error (S.E) :


Sampling procedure invariably generates difference between the
sample and population parameters, because of chance/ biological
variability.

Such a difference between sample and population value is measured by


statistic known as sampling error/ S.E.

Standard error is thus a measure of chance variation and it does not


mean error/ mistake.

S.E. of mean =

Thus Standard error of mean is: standard deviation of the sample


divided by the square root of the number of observation in the sample.
S.E. is minimized by reducing standard deviation (S.D.) and S.D. is
reduced by taking large sample.
Uses of Standard error:

[Link] calculate sample size.

[Link] determine whether sample is drawn from a known population/


not when its mean is known.

***********************************************

12)Measures of Central Tendency:


A measure of central tendency is a single value that attempts to
describe a set of data by identifying the central position within that set
of data.
Also, called measures of central location.
It describes the position of distribution.
There are three measures of central tendency
1. Arithmetic Mean
2. Median
3. Mode

[Link] Mean / Mean (𝑥):

Arithmetic mean is most commonly used measure of central tendency.

The arithmetic mean is sum of values of each observation in a dataset


divided by the number of observations.
𝑠𝑢𝑚 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
i.e. Arithmetic Mean=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
It is denoted by 𝑥.

Example: The birth weight of 6 babies are 2,2.4,2.6,3.1,3.4 and 2.5 kg.
∑ 𝑥𝑖 2+2.4+2.6+3.1+3.4+2.5 16
Arithmetic mean= 𝑥= = = 6 =2.6667 kg
𝑛 6

Advantages of mean:

• It is simple to understand and easy to calculate.


• It depends on each observation in the data.
• It is capable of further mathematical treatment.

Disadvantages (limitations) of A.M.:

• It is not applicable for qualitative data.


• It cannot be determine graphically.
• We cannot calculate mean even if a single data value is missing.
• It cannot be calculated for a grouped frequency distribution if
class intervals are open ended.

2. Median:

• When all the observations of variables arranged either increasing


or decreasing order, the middle observation is called as median.
• It divides observations in two equal parts.
• It is used in medicine, to fixed the average dosage of the drug.

Computation of median:
Arrange data in ascending or descending order of magnitude.
𝑛+1 th
Median= ( ) observation; If n is odd
2

Median=(n/2)tℎ observation+(n+1/2)tℎ observation/2; If n is even

Example:

The points scored by a basketball team in a series of matches are as


follows:
16,1,6,26,14,,4,13,8,9,23,47,9,7,8,17,28.

Arrange data in ascending order:


1,4,6,7,8,8,9,9,13,14,16,17,23,26,28,47
9+13
Median= =11
2

Advantages of median:

• It is simple to understand and easy to calculate.


• It is not affected by extreme observations.
• It can be easily represented graphically.
• It can be used for quantitative as well as qualitative data.

Disadvantages (limitations) of median:

• While calculating median, all the data should be arranged in


ascending or descending order. In case of large number of items,
it becomes time consuming.
• It is not based on all observations.
• In the case of even number of observations, the median cannot
be determine exactly.
We estimate it by taking mean of two middle terms.

[Link]:

• The observation that occurs most frequently in the data is called


mode of the data.
• That is observation which repeated maximum number of times.
• Example: The readymade garments and shoe industries etc. make
use of this measure of central tendency. Based on mode of the
demand data, these industries decide which size of the product
should be produced in large numbers to meet the market
demand.
• Example:
In a survey of 10 households, the number of children was found to
be 4,1,5,4,3,7,2,3,4,1.
Solution:
Mode= most repeated observation
=4

***************************************************

13)Types of Series:

i)Simple ii) Discrete iii) Continuous

i) Simple Series:
In this type of series the data is arranged as collected there are no
fractions in the data.

ii) Discrete Series:

This data contains whole numbers.

e.g. No. of patients dying from cancer. This data never in fractions.

In medical studies, such data are most collected in pharmacology to


find the action of a drug and in clinical practice to test/ compare the
efficacy of a drug.

Statistical measures applied to this data are Chi- square test, Standard
error etc.

iii) Continuous series:

Here there is a possibility of getting fractions like 1.2, 3.8 etc.


depending upon our requirement. Weight can be expressed in decimals
i.e. it takes all possible values in a certain range.

This statistical methods employed in the analysis of such data are


mean, Range, Standard deviation, Correlation coefficient.

*****************************************************

14)Presentation of data:
Data
Tabulation Drawings

[Link] table

[Link] table Diagrams Graphs

[Link] dist table (qualitative) (quantitative)

[Link] table [Link] diagram [Link]

[Link] diagram [Link]. polygon

[Link] diagram [Link]. curve

[Link] diagram [Link] chart

[Link] freq

[Link]/ dot

I. Tabulation:
It is the process of arranging data in table format.
It is the first step in presentation and analysis of the data.

Types:
i) Simple table
ii) Complex table
iii) Frequency distribution table

i)Simple table:
The characteristics under observations are fixed.
Number of the frequency of events is small.
Ex. Location wise distribution of admitted persons according to heart
disease.

Location Heart Disease Total


Yes No
Rural 40 80 120
Urban 50 30 80
Total 90 110 200

ii) Complex table:

If more than two attributes comes for presentation then table become
manifold such table is called as complex table.
Ex- In previous example- If third attribute “Sex” added

Heart Disease

Location Yes No Total

Male Female Male Female

Rural 10 30 40 40 120

Urban 30 20 10 20 80

Total 40 50 50 60 200
iii)Frequency distribution table:

Large or ungrouped data is presented in small, manageable number.


Types:
a) Discrete frequency distribution (grouped or ungrouped)
Ex.- Age on last birthday-
21,22,20,24,20,20,22,26,24,26,21,21,25,25,20,25,25,21,25

Ungrouped
Age Tally mark Freq.
20 IIII 4
21 IIII 4
22 II 2
23 -- 0
24 II 2
25 IIII I 6
26 II 2

Grouped
Age Tally Mark Freq
20-21 IIII III 8
22-23 II 2
24-25 IIII III 8
26-27 II 2
Total N 20

b) Continuous frequency distribution (grouped)

Ex.- Weight of Patients:-


72,62.5,48,80,48.2,61.2,58.5,58,59.5,55.3,45.8,63.3,52,58.8,65.8,81.2,8
6,69.2
Wt. in Kg Tally mark Frequency
40-50 III 3
50-60 IIII I 6
60-70 IIII 5
70-80 I 1
80-90 III 3
Total 18

2. Drawings:

Pictures are easier to understand than numbers, therefore charts and


graphs prove to be very useful presentation tool.
Diagrams on the other hand are useful for visual presentation of
qualitative data.

I. Diagrams
Sometimes our goal is to present just summary of data. In such
situation diagrams very useful. They generally used with qualitative
data.
i) Bar Diagrams:
It is useful for presentation of qualitative data.
It is easy to prepare and it used for comparing the categories of
mutually exclusive discrete data.
There are 3 types of bar diagrams –
(a) Simple bar diagram
(b) Multiple bar diagram
(c) Proportional bar diagram

a) Simple bar diagram:

Bar graphs are also used to compare data and show relationships
between two or more variables (or groups or items).
Each independent variable is discrete, such as race or gender (which
only has two categories: male and female).

Ex. Represent the following data in simple bar diagram.


Blood Group No. of patients
A 30
B 40
AB 25
O 15
30

25
25
NO. OF PATIENTS

20

15
15
10
10
5
5
0
A B AB O
BLOOD GROUP

b) Multiple bar diagram:

It is used when two or more related sets are too explained.


Ex. Represent the following data using multiple bar diagram
Blood Group Male Female
A 15 15
B 25 15
AB 10 15
O 5 10

30

25
No. of Patients

20

15

10

0
A B AB O
Blood group
Male Female
c) Proportional bar diagram:

There are two or more sets representing single attribute.


But the main difference is that in percentage bar diagram the height of
bar is kept same that is 100% and each value is calculated in
percentage.
Ex. Represent following data in percentage bar diagram

Blood Group Male Female Total


A 15 15 30
B 25 15 40
AB 10 15 25
O 5 10

120
% of the patients

100

80

60

40

20

0
A B AB O
Blood group

Male Female

ii) Pie Diagrams:

This is one way of presenting discrete data, it represents proportions.


Areas of sectors of a circle represent different proportions & degree of
angle denote the frequency.
𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑡𝑜𝑟
Angle of a sector = x 360
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑙𝑢𝑒

Ex. Following is the data of different streams of health science with


number of students

Streams
Stream No. Of students
200
MBBS 500
BDS 420 300
500

BAMS 350 350 420


BHMS 300
BUMS 200
MBBS BDS BAMS BHMS BUMS
Total 1770

iv)Picture diagram:

In this frequency of the observations are shown in the picture format.

iv) Map or spot diagram:


It shows the graphical distribution of frequency characters.

II. Graphs
[Link]:
Histogram is a very popular method of presenting frequency
distributions.
The classes are marked along the horizontal, i.e. x- axis.
Taking the class interval as the base, rectangles are erected with
heights proportional to the frequencies of the respective classes.
Size 10-20 20-30 30-40 40-50 0-60
Frequency 5 8 30 45 10

Histogram
50

40
Frequency

30

20

10

0
10_20 20-30 30-40 40-50 50-60
size
[Link] Polygon:

It is a curve representing a frequency distribution.


It is used when the sets of data are to be shown on same diagram such
as birth rates, death rates etc.
18
16
16 15

14

12

10 9

8 7

6 5 5

4
2
2 1

0
0 1 2 3 4 5 6 7 8 9

[Link] Curve:

When the number of observations is very large and groups are more
the frequency polygon tends to loss its angulations and it forms
frequency curve.
Frequency Curve
70

60

50
No. of families

40

30

20

10

0
0 5 10 15 20 25 30
Age
[Link] Chart:

This is the frequency polygon presenting variation by line.


It shows the trend of an event occurring over period of time rising,
falling and showing fluctuations such as birth rate, death rate, cancer
death etc.

[Link] Frequency curve:

It is the graph of cumulative relative frequency distribution.


An ordinary frequency distribution table in quantitative data has to be
converted into relative frequency distribution table.
Then these frequencies are plotted corresponding to group limits of
characteristics.
250

200
Frequency

150

100

50

0
160 165 170 175 180 185
Height (cm)

[Link] or Dot diagram :


It is a graphical presentation made to show the nature of correlation
between two variables X and Y in same person or group, hence it is also
called as "co-relation diagram".
185

180
Height of children

175

170

165

160

155
155 160 165 170 175 180 185

Height of parents

***************************************************

15)Odd’s Ratio:

The odds are defined as the probability that the event will occur
divided by the probability that the event will not occur. If the
probability of an event occurring is Y, then the probability of the event
not occurring is 1-Y.

Odds of event = Y / (1-Y)

if the probability of the event occurring = 0.80, then the odds are 0.80 /
(1-0.80) = 0.80/0.20 = 4

An odds ratio is a measure of association between an exposure and


outcome.

The odds ratio is a measure of effect size, describing the strength if


association or non independence between the two binary data values.

It is used as a descriptive statistics.


The odds ratio treats the two variables being compared symmetrically
and can be estimated using some type of non random samples.

The odds ratio is commonly used in survey research in epidemiology


and to express the result of case – control studies.

It gives clear and direct information to clinicians about which treatment


approach has the best odds of benefiting the patient.

Uses of odds ratio:

[Link] of the effect size of a difference in two drug


inventions.

[Link] epidemiology studies, the odds ratio is used to determine post hoc,
if different groups had different outcomes on a particular measure.

[Link] odds ratio is helpful in clinical situations to be able to provide the


patient with info on the odds of one outcome versus another.

***************************************************

16)Vital Statistics:
Vital statistics are conventionally numerical records of marriage birth ,
sickness and death by which the health and growth of community may
be studied.
It is a branch of biometry that deals with data and law of human
mortality, morbidity and demography.

Sources of Vital Statistics and Demographic Data:


Four main sources of Vital Statistics:
1. Civil Registration System
2. National Sample Survey
3. Sample Registration System
4. Health Surveys

1. Civil Registration System:

It is defined as the continuous permanent and compulsory recording of


the occurrence of vital events like live births , deaths , fetal deaths ,
marriages , divorces , as well as annulments , judicial separation ,
adoption . Civil registration is performed under a law and regulation so
as to provide legal basis to the records and certificate made from
system.

2. National Sample Survey:

The data collected from the census are not very reliable and available
only once in 10 years . In absence of reliable data from the civil
registration system ( SRS ) , the need for reliable statistics at national
and state levels is being met through sample surveys launched from
time to time.

3. Sample Registration System:


In this system , there is continuous enumeration of births and deaths in
a sample of villages / urban blocks by a resident part-time enumerator
and then an independent six monthly retrospective survey by a full time
supervisor .

4. Health Surveys:

A few important sources for demographic data have emerged . These


are National Family Health Surveys ( NFHS ) and the District Levels
Household Surveys ( DLHS ) conducted for evaluation of reproductive
and child health programmes .

NFHS provide estimates of fertility , child mortality and a no . of fertility


, child mortality and a no . of health parameters relating to infants and
children at state level .

Uses of Vital Statistics:

1) For the Individual:


It is a legal document which is used for admission to a school, for
getting a passport to travel abroad and even to migrate to another
country, etc. Similarly, a marriage certificate records the marital status
of a couple and legalises the birth of children from that marriage.

2) Legal Use:
Vital statistics are legally very useful. Certificates relating to birth,
death, marriage, divorce, etc. have legal importance. For instance, a
death certificate is an important legal document for the settlement of
property of the deceased person, the claim of his/her insurance policy,
etc.
3) Health and Family Planning Programmes:

Vital statistics relating to births and deaths can be used in health and
family planning programmes of the government. The causes of
deaths, and the mortality rates of different categories help in
assessing the health condition of the people.
Accordingly, the state can formulate such health programmes as
malaria eradication, polio and small pox immunisation, tuberculosis,
etc. In keeping with the requirements of the population, the
government can open hospitals, maternity and child welfare centres,
etc.

4) For Administrators and Planners:

Data provided by vital statistics relating to trend and growth of


population in the various age groups and on the whole, help
planners and administrators to plan and formulate policies for public
health, education, housing, transport and communications, food
supplies, etc.

5) For the Nation:

Vital statistics are of much importance for the nation. They help in
analyzing the population trends at any given point of time. They try
to fill the gap between two censuses. They relate to the composition,
size, distribution and growth of population.
It is on their basis that population projections can be made. Vital
statistics help in formulating policies for providing social security to
the people. Even the rules for immigration and emigration can be
framed on the basis of population growth data. Vital statistics are
also used for updating electoral rolls and demarcation of
constituencies.
**************************************************

You might also like