Biostatistics Module Sep2023 240520 122333
Biostatistics Module Sep2023 240520 122333
TEACHER’S
S.NO. CONTENTS PAGE NO.
SIGNATURE
2. Summarization of Data
6. Sampling
DEFINITIONS: -
Statistics
Statistics may be defined as the science that deals with the collection, organization,
presentation, analysis, and interpretation of data and using them to estimate the magnitude of
associations and test hypotheses.
Biostatistics
Biostatistics is the application of statistical methods in the field of medicine and health in
order to draw inferences, test hypotheses, and generate evidence for medical practice.
ii. To find the difference between means and proportions of normal height at two places or in
different periods. The mean height of boys in Gujarat is less than the mean height in Punjab.
Whether this difference is due to chance of a natural variation or because of some other
factors such as better nutrition playing a part, has to be decided.
To find the correlation between two variables X and Y such as height and weight—whether
weight increases or decreases proportionately with height and if so by how much, has to be
found.
In pharmacology
i. To find the action of any drug—a drug is given to animals or humans to see whether the
changes produced are due to the drug or by chance.
ii. To compare the actions of two different drugs or two successive dosages of the same drug.
iii. To find the relative potency of a new drug with respect to a standard drug.
In medicine
i. To compare the efficacy of a particular drug, operation or line of treatment—for this, the
percentage cured, relieved or died in the experiment and control groups, is compared and
the difference due to chance or otherwise is found by applying statistical techniques.
ii. To find an association between two attributes such as cancer and smoking or filariasis and
social class—an appropriate test is applied for this purpose.
iii. To identify signs and symptoms of a disease or syndrome. Cough in typhoid is found by
chance and fever is found in almost every case. The incidence of one symptom or another
indicates whether it is a characteristic feature of the disease or not.
In public health, the measures adopted are evaluated. Lowering of morbidity rate in typhoid
after pasteurization of milk may be attributed to clean supply of milk, if it is statistically
proved. Fall in birth rate may be the result of family planning methods adopted under
National Family Welfare Programme or due to rise in living standards, increasing awareness
and higher age of marriage.
Data
Data is a collection of observations expressed in numerical figures. Data is always in
collective
sense and should never be used singularly.
Types of Data: -
DATA
Based on the
Based on the
collection of
measurability
data
(i) Quantitative data: These are those data that are of measurable magnitude e.g., Height,
Weight, Age, Temperature, Blood pressure, Pulse rate, Respiratory rate, etc. Quantitative data
may be continuous or discrete. Discrete data are countable in time or space, such as the pulse
rate per min, number of children in a family, TLC, number of patients attended by a doctor in
a day, etc. Discrete data cannot take fractional values. In comparison, continuous data are
measurable and can take fractional values. They have specific units of measurement. Most of
the individual biological variables are continuous data. Examples include blood sugar, serum
creatinine, weight, height, ESR, blood pressure, and age.
(ii) Qualitative data: In such data, there is no notion of magnitude or size of the
characteristic or attribute as the same cannot be measured. They are classified by individuals
having the same characteristic or attribute and not by measurement. Persons with the same
characteristic form specific groups or classes such as attacked, escaped, died, cured, relieved,
vaccinated, males, young, old, treated, not treated, on the drug, on placebo, inoculated or not
inoculated, healthy or unhealthy, the severity of pain, preference of colours, stages of any
tumour, etc.
(iii) Primary data: Those that are collected for the first time and are original in character.
e.g., the number of births and deaths registered by a municipal clerk constitute primary data,
the data collected in real-time by a researcher, etc.
(iv) Secondary data: Those that have been already collected by someone for some purpose
and are available for the present study. e.g., The data from Census, the data from various
national and international surveys (NFHS, DHS, CEA, etc.)
Variable: Variable is a characteristic that assumes different values. The types of variables
have been depicted in the flowchart below: -
VARIABLE
Based on the
Based on the
characteristics of
research
values
Exposure/ Outcome/
Categorical Discrete Continuous
Independent Dependent
(i) Categorical Variable: Categorical variables are either nominal (unordered) or ordinal
(ordered). For example, nominal variables are male/female, alive/dead, blood groups O, A, B,
AB, etc. For nominal variables with more than two categories the order does not matter. For
example, one cannot say that people in blood group B lie between those in A and those in
AB. Sometimes, however, people can provide ordered responses, such as severity of pain
(mild, moderate, severe), grade of breast cancer, or they can “agree”, “neither agree nor
disagree”, or “disagree” with some statement. In this case, the order does matter and it is
usually important to account for it.
(ii) Discrete Variable: These variables are those that take only integral values (whole
numbers) Discrete variables generally refer to counting, such as the number of patients of a
particular cancer in a hospital, the number of people with a certain disease, number of daily
new admissions, number of deaths, number of births, etc.
(iii) Continuous Variable: These include those variables which can take all possible values
between two consecutive values e.g., Height, weight, temperature, BMI, etc.
(iv) Exposure Variable: These are those variables that influence disease outcomes, including
medical treatments.
(v) Outcome Variable: These are those variables whose variation or occurrence we are
seeking to understand.
Table 1.2: Commonly used alternatives for describing exposure and outcome variables
Exercise 2: What is your height? What type of variable is it? Note your result and of the rest
of the students in your class in a table and interpret your findings?
Exercise 3: Mention the colour of your choice from the following list.
(Blue, Green, Yellow, Orange, Red, White, Others) Note your result and that of your
classmates in a tabular form and mention the type of data and the type of variable. Arrange
and interpret your findings.
Exercise 4: Count your resting pulse beat for 60 seconds and record your result and that of
the rest of the students systematically in a tabular form. Mention the type of data.
Exercise 5: Have you ever been vaccinated for Influenza? Note down the responses in your
class (Yes/No). Tabulate the observations and interpret the findings of your class?
1.1 SOURCES OF DATA
Exercise 1. What is the Infant Mortality Rate of India as per the latest NFHS & SRS data?
Exercise 2. Describe the trend in the population of India from 1947 till 2011 according to the
data in the Census of India? (Explore the census of India website to find the data; use the
link: https://censusindia.gov.in/census_data_2001/india_at_glance/variation.aspx)
Exercise 3. The government of India employs staff to collect the data and then publishes the
result of the survey. Some students of a college used the data from these reports for further
analyses and comparison as part of some research. Which among these are primary data and
which one is secondary?
2. SUMMARIZATION/PRESENTATION OF DATA
Objectives of Data Presentation
1. To arrange the data in such a way that it will arouse interest in a reader.
2. To make data sufficiently concise at the same time without losing important details.
3. To present the data in simple form in order to make it possible to form impressions and
draw some conclusions directly or indirectly.
4. To help in further statistical analysis.
In the above example, the age is split into groups of five. These are known as class intervals.
The number of observations in each group is called frequency. In constructing frequency
distribution tables, the questions that arise are: Into how many groups the data should be
split? And what class intervals should be chosen? As a practical rule, it might be stated that
when there is large data, a maximum of 20 groups, and when there is not much data, a
minimum of 5 groups, could be conveniently taken. The merits of a frequency distribution
table are, that it shows at a glance how many individual observations are in a group, and
where the main concentration lies. It also shows the range, and the shape of distribution.
10
8
6
4
2
0
0-6 7 8 9 10 11 12 13 14 15
Haemoglobin level (gm/dL)
12
10
8
6
4
2
0
0-6 7 8 9 10 11 12 13 14 15
Haemoglobin level (gm/dL)
3. Cumulative Frequency Curves or Ogives: Ogive is the graphic representation of a
cumulative frequency of a distribution. The graph in such a case always rises up.
70
Cummulaive Frequency
60
50
40
30
20
10
0
0-6 7 8 9 10 11 12 13
Haemoglobin level (gm/dL)
4. Line Diagram: It is used to show the trend of events with the passage of time.
2500
2000
1500
1000
500
0
2015 2016 2017 2018 2019 2020
Year
Exercise 1: Prepare a frequency distribution table from the data collected by you previously
on resting pulse beat/ 60 seconds and present it graphically by a Histogram? Write down the
steps used to make Histogram in MS Excel.
Exercise 2: The haemoglobin values of female patients attending the OPD are given below.
Arrange them systematically, make frequency distribution tables and represent them with a
frequency polygon and an Ogive using MS Excel?
6, 6.8, 7, 7.4, 7.5, 7.8, 7.9, 8, 8.2, 8.4, 8.5, 8.6, 8.8, 9, 9.2, 9.4, 9.4, 9.4, 9.4, 9.4, 9.4, 9.4, 9.4,
9.4, 9.4,9.4, 9.4, 9.4, 9.4, 9.4, 9.4, 9.4, 9.4, 9.4, 9.4, 9.5, 9.6, 9.8, 9.8,9.8, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10.2, 10.2, 10.2, 10.2, 10.4, 10.4, 10.4, 10.4, 10.4, 10.4, 10.6, 10.6,
10.6, 10.6, 10.6, 10.6, 10.8,10.8, 10.8, 10.8, 10.8, 10.8, 10.8, 10.8, 10.8, 10.8, 10.8, 10.8,
10.8, 10.8, 10.8, 10.8, 10.8, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11.2, 11.2, 11.2, 11.4, 11.4, 11.4, 11.6, 11.6, 11.6, 11.6,
11.6, 11.8, 11.8, 11.8, 11.8, 11.8, 11.8, 11.8, 11.8, 11.8, 11.8, 11.8, 12.2, 12.2, 12.2, 12.2,
12.2, 12.2, 12.2, 12.2, 12.4, 12.4, 12.4, 12.4, 12.6, 12.6, 12.6, 12.6, 12.6, 12.8, 12.8, 12.8, 13,
13, 13, 13, 13, 13, 13, 13.4, 13.4, 13.4, 13.4, 13.4, 13.4, 13.4, 14, 14, 14, 14, 14, 14, 14.2,
14.2, 14.2, 14.4, 14.4, 14.4, 14.4, 14.4
Exercise 3: The infant mortality rate from 2005 to 2015 is as follows respectively. Represent
the values graphically in the form of a suitable diagram.
58.4, 56, 52.8, 50.8, 50.6, 48, 47, 45.4, 44, 43, 42.8.
Exercise 4: Make a frequency distribution table of the height of the students in your class and
represent the findings with the help of a suitable graph.
Additional Objectives in presenting statistical data
1. The tables should be numbered e.g., Table-1, Table-2, etc.
2. A title must be given to each table.
3. The headings of columns or rows should be clear and concise.
4. The data must be presented according to the size or importance; chronologically,
Alphabetically, or geographically.
5. If percentages or averages are to be compared, they should be placed as close as possible.
1. Bar Diagram: This is to represent discontinuous data. The length of bars drawn
horizontally or vertically indicates the frequency of the character. Spacing in between
the two bars should be nearly equal to half of the width of the bar. It can be
a. a simple bar
b. one with multiple bars
c. a composite bar
The main difference between the bar diagram and the histogram is that the former has space
in between two bars and the latter doesn’t.
1931590
2000000
1414259
1500000
1089103
1000000
500000
0
Breast Lung Colorectal Prostate Gastric
Carcinoma Carcinoma carcinoma Carcinoma carcinoma
Type of cancer
(Source: GLOBOCAN 2020. The Global Cancer Observatory. Fact sheet. 2021 Mar.
Available from: https://gco.iarc.fr/today/data/factsheets/populations/900-world-fact-
sheets.pdf)
COMPONENT BAR CHART
Male Female
2000000
1500000
1000000 2018
2020
500000
0
Breast Carcinoma Lung Carcinoma Stomach Colon Carcinoma
Carcinoma
Type of cancer
(Source: GLOBOCAN 2020. The Global Cancer Observatory. Fact sheet. 2021 Mar.
Available from: https://gco.iarc.fr/today/data/factsheets/populations/900-world-fact-
sheets.pdf)
2. Pie Diagram: This is to most useful for showing proportions out of the total.
Frequencies are shown in a circle. The degree of angle denotes areas of the sectors of a
circle. It is most useful for giving a comparative difference at a glance. The size of each
angle is calculated by multiplying the class percentage by 3.6.
Football
17%
29% Camping
Jogging
24% Walking
12%
Bicycling
18%
Fig 2.10: Spot Map showing the Cholera cases in Broad Street, London mapped
by John Snow in 1854
BOXPLOT (Also known as Box-and-Whisker plot)
The Boxplot is a graphical representation of the five-number summary. The five-number
summary is a collection of five summary measures f the data, viz., Minimum, First
Quartile, Median, Third Quartile, and Maximum. The interquartile range (3rd quartile -
1st quartile) is represented by a box. The box is divided into two by a line representing
the median. The two whiskers represent the minimum and maximum values in the
dataset. If the data is normally distributed the line representing the median will be
placed almost in the middle of the box and the length of the whiskers will be similar. If
there are outliers in the data, they are represented by small circles or asterisks. Any value
which exceeds the 3rd quartile by 1.5 times the interquartile range and any value
lower than 1st quartile - 1.5 times the interquartile range, is labeled as an outlier.
A boxplot is very useful when we want to compare the distribution of a continuous
variable across categories of a categorical variable. For example, if we wish to compare
the cholesterol levels across population age groups, or various socio-economic categories.
EXERCISES
Exercise 1: Deaths in India due to Cancer of Lungs from 1957 – 1966 is given below.
Represent the data by a Line diagram and a Bar diagram with proper labelling.
Exercise 2:
Out of a total of 200 patients with Tuberculosis, 80 patients belonged to lower socio-
economic strata, 60 patients were from the lower middle class, 40 from the upper-middle
class, and the rest from the upper class. Represent the above data with the help of a pie chart.
Exercise 3:
Find out the life expectancy of Indians in years from 2000 till 2020 as per the national
population survey. Compare and analyse it in the form of a line diagram.
Prepare a multiple bar diagram for Life expectancy from 2000 to 2020 for the state with the
highest LE and the state with the lowest LE? Use the below link for the data
(https://censusindia.gov.in/census_data_2001/india_at_glance/variation.aspx)
Exercise 4:
Prepare a frequency distribution table from the colour choice of students and present it in the
form of a pie chart?
3. MEASURES OF CENTRAL TENDENCY
A measure of central tendency provides a single value that summarizes entire distribution of
data.
Measures of central tendency include – mean, median and mode. The selection of the best
measure to given distribution depends on the shape of the distribution and the intended use of
measure.
MEAN
This measure implies arithmetic average or arithmetic mean which is obtained by summing
up all the observations and dividing the total by the number of observations. Mean is of three
types – Arithmetic mean, geometric mean and harmonic mean. The arithmetic mean is a more
technical name for what is more commonly called the mean or average.
USES
1. It is the best descriptive measure for data that are normally distributed.
2. Mean is not a measure of choice if data is severely skewed or have any extreme
values in one direction or another.
Example:
Q1. Find the mean of the incubation periods for Hepatitis A – 27, 31, 15, 30 and 22.
Step 1 Add all of the observations in the distribution
27+31+15+30+22 = 125
Step 2 Divide the sum by the number of observations
125/5 = 25
Therefore, the mean incubation period is 25 days.
Q2. Erythrocyte sedimentation rates (ESRs) of 7 subjects are 7, 5, 3, 4, 6, 4, 5. Calculate the
mean.
Step 1 Add all of the observations in the distribution
7+5+3+4+6+4+5 = 34
Step 2 Divide the sum by the number of observations
34/7 = 4.86
Therefore, the mean ESR is 4.86.
To find the mean using excel
=AVERAGE(A1:A7)
MEAN IN GROUPED DATA
Sometimes data is provided or collected in the form of grouped data. For example a
questionnaire might ask the participants to select an age range instead of mentioning the exact
age. In such scenarios we calculate mean using the grouped data method.
Example:
Calculate mean systolic blood pressure from the given table?
Systolic blood pressure Frequency
90-100 3
100-110 5
110-120 7
120-130 10
130-140 15
140-150 11
150-160 9
160-170 6
170-180 2
Total (F) 68
Answer:
1. Calculate mid-point of each interval by adding lower limit and upper limit of that
interval and divide the sum by two.(Xi)
2. Multiply the mid-point of each interval with the frequency of that interval. (Xifi)
3. Add all the values to get the total for all the observations ( Xifi)
4. Divide the total of all observations in step 3 by total frequency(F)
Mean = i i
GEOMETRIC MEAN
The geometric mean is the mean or average of a set of data measured on a logarithmic scale.
The geometric mean is particularly useful in the laboratory for data from serial dilution
assays (1/2, 1/4, 1/8, 1/16, etc.).
There are two methods for calculating the geometric mean.
Method A
Step 1. Take the logarithm of each value.
Step 2. Calculate the mean of the log values by summing the log values, then dividing by the
number of observations.
Step 3. Take the antilog of the mean of the log values to get the geometric mean.
Method B
Step 1. Calculate the product of the values by multiplying all of the values together.
Step 2. Take the nth root of the product (where n is the number of observations) to get the
geometric mean.
HARMONIC MEAN
It is a type of numerical average. It is calculated by dividing the number of observations by
the reciprocal of each number in the series. Thus the harmonic mean is the reciprocal of the
arithmetic mean of the reciprocals. Harmonic means are often used in averaging things like
rates.
MEDIAN
When all the observations of a variable are arranged in either ascending or descending order,
the middle observation is known as median. It implies the mid-value of series.
Consider example of 7 observations in absenteeism of school children in the series 4, 6, 8,
(10), 12, 14, 32. • Median value = 10.
When the number of observations is even, the Mean of the two central values is the Median.
For example, the Median of the values 8, 20, 50, 25, 15 & 30 i.e., 8, 15, 20, 25, 35 & 50 is
(20+25) / 2 = 22.5
Median is a better indicator of central value when one or more of the lowest or the highest
observations are wide apart or not so evenly distributed.
To find the median using Excel
=MEDIAN(A1:A7)
MODE
This is the most frequently occurring observation in a series, i.e., the most common or most
fashionable, such as 8 mm in tuberculin test of 10 boys given below: 3, 5, 7, 8, 8, 8, 10, 11,
12.
Mode is rarely used in medical studies. Out of the three measures of central tendency mean is
better and utilized more often because it uses all the observations in the data and is further
used in the tests of significance.
To find the mode using excel
=MODE(A1:A10)
Exercise 2: Calculate mean from given data which shows marks obtained by students in a test
in class.
Marks Obtained No. of students
51-70 17
71-90 13
91-110 19
111-130 22
131-150 29
Exercise 3: The incubation period of 10 cases of measles is given as follows. Find out the
mean incubation period? What is the median value and the mode? Which among the three is
the best for taking out the average and why?
23, 22, 20, 24, 16, 17, 18, 19, 21,20.
4. MEASURES OF DISPERSION
Spread, or dispersion, is the second important feature of frequency distributions. Just as
measures of central tendency describe where the peak is located, measures of spread describe
the dispersion (or variation) of values from that peak in the distribution. Measures of spread
include range, mean deviation, standard deviation, variance, coefficient of variation, and
interquartile range.
1. Range: The distance between the lowest and the highest values of the observations of
a distribution
Method for identifying the range
Step 1. Identify the smallest (minimum) observation and the largest (maximum)
observation.
Step 2. Epidemiologically, report the minimum and maximum values. Statistically,
subtract the minimum from the maximum value.
2. Mean Deviation: It is the arithmetic average of the absolute deviations of the values
of the observations from an average.
Mean Deviation = |( )|
( )
3. Standard Deviation: It is the square root of the arithmetic average of the squares of
the deviations of the observations from the mean. It is statistically the most important
and commonly used measure of dispersion (variability). It is denoted by small Greek
letter (sigma). It gives us an idea of the spread of the dispersion; that the larger the
standard deviation, the greater the dispersion of values about mean.
Standard Deviation = ( )
( )
Example: Incubation period of 10 SARS cases was 6,7,5,4,3,4,5,6,7,8 days. Calculate the
mean incubation period and standard deviation. Write interpretation about the findings?
1 6 0.5 0.25
2 7 1.5 2.25
3 5 -0.5 0.25
4 4 -1.5 2.25
5 3 -2.5 6.25
6 4 -1.5 2.25
7 5 -0.5 0.25
8 6 0.5 0.25
9 7 1.5 2.25
10 8 2.5 6.25
Total 55 0 22.50
.
Standard Deviation =
( )
Answer:
Step 1: Calculate mid-point of each interval by adding lower limit and upper limit of that
interval and divide the sum by two.(x)
Step 4: Calculate the square of the values which were calculated in Step 3. (x- )2
Step 5: Multiply the value obtained in step 4 by frequency of the corresponding group f(x- )2
SD = ( )
( )
Q1. The pulse rate per minute of 12 individuals is given to you. Calculate the mean, mean
deviation, standard deviation, coefficient of variation and range of pulse rate? 59, 62, 64, 65,
68, 73, 74, 75, 78, 80, 80, 86.
Q2. Serum bilirubin levels (in mg / 100 ml) of 9 individuals are: 0.4, 0.3, 0.8, 1, 1.6, 3.2, 0.9,
0.5, 4.8. Compute the measures of dispersion.
Q3: Calculate mean and Standard deviation from given data which shows marks obtained by
students in a test in class.
Marks Obtained No. of students
51-70 17
71-90 13
91-110 19
111-130 22
131-150 29
4.2 PERCENTILES AND QUARTILES
If you are interested in where you stand compared to the rest of the herd, you need a statistic
that reports relative standing, and that statistic is called a percentile. The k th percentile is a
value in a data set that splits the data into two pieces: The lower piece contains k percent of
the data, and the upper piece contains the rest of the data which amounts to [100 – k] percent.
The median is the 50th percentile: the point in the data where 50% of the data fall below that
point, and 50% fall above it.
Similarly, Quartiles are divided into 4 equal halves at 25th, 50th and the 75th percentile
which are the 1st, 2nd and 3rd quartiles respectively.
To calculate the kth percentile (where k is any number between zero and one hundred),
perform the following steps:
1. Arrange all the values in the data set from smallest to largest.
( )
2. Find the value of , where k is the percentile intended to calculate (e.g, for
90th percentile, k =90), and n is the number of observations in the dataset. Let us call
this as the “index”. The index has a whole number and a fraction (x). If the index is a
whole number, then x =0. The whole number indicates the position of the percentile
value in the dataset. Let us call this Pa.
3. Find the value of the Pa by counting from the left of the arranged dataset. Note down
its value. Let us call it Va.
4. Find the value of the number immediately following Pa in the arranged dataset. Let us
call it Vb.
5. The desired percentile can then found by the following formula:
kth percentile = Va + x(Vb – Va)
For example, suppose you have 25 test scores, and in order from lowest to highest they look
like this: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96,
98, 99, 99. We shall calculate the 90th percentile for this dataset.
Step1. Arrange in ascending order
( ) ( )
Step2. Find “index”=Pa and x = = = 23.4 Therefore Pa = 23 & x = 0.4
Step 3: Pa = 23, so Va = 98 (Position of the 23rd value in the arranged dataset)
Step 4: Vb= 99 (The number immediately following Va)
Step 5 : 90th percentile = 98 + 0.4(99-98) =98.4
Finding a desired percentile using Microsoft Excel
= PERCENTILE.EXC(Array,percentile)= PERCENTILE.EXC(A1:A25,0.90)
EXERCISES:
Exercise 1. Find the median, first quartile and third quartile, inter-quartile range and the range
of the following numbers.
Consider a hypothetical example of height in a population (Figure 5.2). Given that the height
is normally distributed with mean = 170 cm and SD = 10 cm,
approximately 68% of the population will have a height between 160 and 180 cm;
approximately 95% of the population will have a height between 150 and 190 cm;
approximately 50% of the population will have a height above 170 cm;
approximately 34% of the population will have a height between 170 and 180 cm;
approximately 47.5% of the population will have a height between 170 and 190 cm;
approximately 13.5% of the population will have a height between 180 and 190 cm;
approximately 2.5% of the population will have a height below 150 cm.
Exercise 5.2: In a population of 15000 people, systolic blood pressure is normally distributed
with mean = 116 mmHg and SD = 8 mmHg. How many persons will have a systolic blood
pressure >124 mmHg?
The Standard Normal Distribution can be used to determine the proportion of the population
that has values in some specified range (or probability that a randomly selected person has
value in the specified range). This is done by calculating the area under the curve (by using
published tables or some computer program). Let us find out the proportion of persons with a
height of more than 176 cm in Figure 5.2. The first step is to calculate the value of z. Given
that the mean = 170 and SD = 10, z = (176-170)/10 = 0.6. From the z-table in the appendix, it
can be found that the area under the curve for z = 0.6 is 0.2743. Thus, the proportion of
persons with height above 176 cm = 0.2743 or 27.43%. An easier way to solve this is to use
the Excel function NORM.DIST() or NORM.S.DIST(). The NORM.DIST() function
evaluates the cumulative probability (area to the left of the curve) when the TRUE option is
used. To solve the current example, we can type =NORM.DIST(176, 170, 10, TRUE) in
Excel. The first argument is the value for which probability is desired, the second and third
are the mean and SD of the distribution, and the fourth is whether the cumulative probability
is desired, which should be set to TRUE.
EXERCISES
Exercise 5.3: In a population, haemoglobin is normally distributed with mean = 11.2 and SD
= 1.3. What percentage of the population has haemoglobin < 11?
Exercise 5.4: In a population, birth weight is normally distributed with mean = 2.9 and SD =
0.3. What is the probability that the next child born has a low birth weight?
Exercise 5.5: For data in example 5.3, find the percentage of the population with
haemoglobin between 12 and 13.5.
STANDARD ERROR AND CONFIDENCE INTERVAL
Standard Error
The standard deviation is sometimes confused with another measure with a similar name - the
standard error of the mean. However, the two are not the same. The standard deviation
describes variability in a set of data. The standard error of the mean refers to the variability
we might expect in the arithmetic means of repeated samples taken from the same population.
The standard error assumes that the data is a sample from a larger population. According to
the assumption, the sample is just one of an infinite number of possible samples that could be
taken from the source population. Thus, the mean for the sample is just one of an infinite
number of other sample means. The standard error quantifies the variation in those sample
means.
If we take a sample of the population and calculate the mean height of the sample, the sample
mean is unlikely to be exactly equal to the population mean. A different sample from the
same population will have a different mean height. This difference is due to sampling
variation. The frequency distribution of all possible sample means is called a sampling
distribution. It has been shown that the mean of this sampling distribution is equal to the
population mean, and its standard = n, where is the population and n is the si e of
the sample.
The standard deviation of the sampling distribution is called the standard error . ince is
seldom known, we use the sample standard deviation, s, in its place. Thus, standard error,
(s.e.), is calculated as:
. .=
We can see that s.e. is inversely proportional to the sample si e, n. Thus, increasing the
sample si e decreases the standard error.
EXERCISES
Exercise 5.6: The mean fasting plasma glucose of a randomly selected sample of 64 persons
from a population is mg d , and the standard deviation is mg d . alculate the
standard error of the mean.
Exercise 5.7: The length of hospital stay among 30 patients has a standard deviation of 9.2.
alculate the standard error of the mean length of hospital stay.
The standard error is also measured for other parameters like proportion, odds, odds ratio,
relative risk, the difference in means, and difference in proportions. The standard error of a
proportion, se(p), is defined as:
(1 )
( )=
95% = 𝐵𝐵 ( ) ± 1.96 ×
For example, in the example of the refractive error cited in the previous section, the
confidence interval for the prevalence of refractive error is 0.1 1. 0.0 = 0.0 to 0. 1
or . to 1. .
EXERCISES
Exercise 5.9: In a survey among participants, the mean total cholesterol was found to be
0 with a standard deviation of . ind the confidence interval for the mean total
cholesterol.
Exercise 5.10: The mean fasting plasma glucose of a randomly selected sample of 64 persons
from a population is mg d , and the standard error is .1 mg d . alculate the
confidence interval of the mean.
Sampling methods
Multisatge onsecutive
sampling sampling
Method of sampling Advantage Disadvantage
Easier to understand As compared to others, a
imple random sampling sampling frame is needed if
the sub group of population
is of interest, they may not be
included in the sufficient
number in the sample
Every sampling unit has ostly especially when the
equal and independent population is dispersed.
chance of being selected
Greater ability to make omplete information
tratified sampling inference within a stratum regarding total population
that belongs to different
stratum is required
More time consuming and
complicated than simple
random sampling
It is easier and simpler Only the selection of first
ystemic sampling element is based on
probability and subsequent
selection is based on
sampling interval, hence
many members of the
population will have ero
chances of selection
It ensures that sample is more
spread across the population
ess time, money and labour May not be representative of
population as a simple
luster sampling random sample of the same
sample si e
It doesn’t require a sampling Error is higher
frame
Simple random sampling: In a simple random sample, every member of the population has
an equal chance of being selected. The sampling frame includes all sampling units in the
population.
EXERCISES
Exercise 6.1: iscuss the advantages and disadvantages of simple random sampling.
Systematic sampling: In systematic sampling every member of the population is listed with
a number, but instead of randomly generating numbers, individuals are chosen at regular
intervals. The sampling frame includes all sampling units in the population. ampling interval
is calculated as the number of sampling units the sample si e. The first unit is selected
using simple random sampling and the rest are chosen at regular intervals (the sampling
interval).
Example All employees of rinagar are listed in a random order. We need to select
100 employees (say from a total of 000) employees. rom the first 0 numbers, randomly
select a starting point say number 1 . rom number 1 onwards, every 0th person on the
list is selected ( , , 1 , 1 , and so on), till we end up with a sample of 100 employees.
EXERCISES
Exercise 6.3: iscuss the advantages and disadvantages of systematic sampling.
EXERCISES
Exercise 6.5: elect a gender-stratified random sample of 10 students in your batch.
Exercise 6.6: iscuss the advantages of a stratified random sample over a simple random
sample.
Cluster sampling: luster sampling also involves dividing the population into subgroups
(areas or clusters). One of the previously mentioned random sampling methods
(simple stratified) is then used to select certain number of clusters from the list. ometimes
clusters are selected using (probability proportionate to si e) method. If it is practically
possible, we might include every individual from each sampled cluster. If the clusters
themselves are large, we can also sample individuals from within each cluster using one of
the techniques above. This is called multistage sampling.
This method is good for dealing with large and dispersed populations, but there is more risk
of error in the sample, as there could be substantial differences between clusters.
Example: If we want to select 000 people from ashmir for a survey, we can make a list of
all clusters (villages wards) in ashmir, stratify the clusters into rural and urban clusters, and
select a certain number of clusters from each list using or . We can the select the
desired number of individuals from each selected cluster using households in the second
stage and individuals within households in the next stage.
EXERCISES
Exercise 6.7: iscuss the advantages and disadvantages of cluster sampling.
Fig 6.1: Pictorial representation of different sampling methods
7. SAMPLE SIZE
Example 2 alculate the sample si e needed to estimate mean height of young adults in a
population if it is anticipated that height has a variance of 1 1 in the population and it is
intended to estimate mean height within 4 cm of its true value.
n=( 1 1) 1 = 0
Example 4 alculate the sample needed for a case-control study if it is anticipated that 0
of the cases will be e posed as compared 0 among controls.
n = (1 0. 0 0. 0) 0. 02 = cases and controls.
In this example, = average of 0 and 0 and = 0 - 0 .
The sample size formulae given above are approximate
(Lehr, R. (1992), Sixteen S-squared over D-squared: A relation for crude sample size
estimates. Statist. Med., 11: 1099-1102. https://doi.org/10.1002/sim.4780110811)
The above formulae are easy to remember and applicable in most common situations. In
practice, however, researchers often use sample size tables, special software programs, or
some online calculator. One such online resource for sample size calculation is
www.openepi.com.
EXERCISES
In the exercises below, use the above formulae as well as www.openepi.com to calculate
sample size.
Exercise 7.1: Calculate sample size for estimating mean hemoglobin among reproductive age
females in Kashmir if it is anticipated that hemoglobin has a standard deviation of 2.2.
Exercise 7.2: An investigator wishes to estimate the prevalence of refractive errors among
school-going children in Srinagar. What sample size does he need if it is anticipated that the
prevalence is 25%.
Exercise 7.3: An investigator wishes to conduct a case-control study to analyze the
relationship between colon cancer and obesity. It is anticipated that the proportion obese
among people with colon cancer is 38%. A recent survey estimated the population prevalence
of obesity to be 24%.
Exercise 7.4: A new treatment strategy is expected to increase the cure rate of a disease by
25%. How many patients should be included in a trial of the treatment strategy if the current
cure rate of the disease if 40%.
8. RESEARCH HYPOTHESIS AND ITS TESTING
I) The research question: A student is studying about animal behaviour and is assigned the
following research:
In a certain species, male ducks have green heads and females are all grey. The purpose of
the green colouring of the male heads is to attract the females. The question is: are female
ducks also attracted to the green colour in food, for example in bread?
II) Writing statistical hypotheses: We basically want to know whether female ducks are
indifferent to green bread versus plain bread or if they prefer green bread. The research
question can be translated into the confrontation of two opposite ideas:
Idea 1: Female ducks are indifferent to plain versus green bread.
Idea 2: Female ducks prefer green bread.
When a female duck is confronted with two pieces of bread, one plain and one green, the
probability of picking the green one will be called p. The two previous ideas in terms of
probability can be: -
Idea 1: p = (probability of picking the green bread will be same as probability if picking the
plain bread)
Idea 2: p > (probability for picking green bread will be more than for plain bread)
These confronting ideas are the “statistical hypotheses”. The first one states that the ducks
equally likely to pick the green and the plain bread. This statement is called the ‘null
hypothesis’ because it represents an idea of no difference and is labelled by the symbol ‘H0’.
The second idea says that the ducks prefer the green bread and states something different
from the first one, so it is called the ‘alternative hypothesis’. The symbol used for the
alternative hypothesis is ‘Ha’. The reason we write >, instead of ≠, as an alternative to = in
this case is because the research question is whether female ducks prefer the green bread
instead of the plain one. We must decide which of these two statistical hypotheses is more
likely to be true. The decision between the two hypotheses is usually expressed in terms of
H0. If we favour Ha (idea 2), we usually say that ‘we reject H0’.
IV) Arriving at a conclusion: If female ducks were truly indifferent between green and plain
bread, about how many ducks, of the 10 that were observed, would you have expected to
choose the green bread first? Of course, even if the null hypothesis was true, we are not
always going to get that result in reality, due to sampling variability or just chance.
Suppose the student finds that 9 of the 10 female ducks sampled prefer the green bread. If
female ducks are really indifferent to plain versus green bread, what is the probability that 9
female ducks in a sample of 10 would pick the green bread first just by chance? Nine out of
10 seems to indicate that female ducks tend to prefer green bread to plain. If more than 9 had
picked the green bread first, it would be a situation even further from what was expected
under the null hypothesis. A number higher than 9 would have given us an even clearer idea
that female ducks tend to prefer the green colour. That is why we are interested in knowing
what is the probability that 9 (the value the student observed) or more female ducks pick the
green bread first. We want to know not only what the chances are of getting the result that we
got, but also what the chances are of getting a result that is further from what the null
hypothesis indicates, assuming the null hypothesis is true. What is the probability that,
assuming that in general female ducks are really indifferent between green and plain bread, 9
or more female ducks in a sample of 10 would pick the green bread first just by chance? This
probability, of getting the result we got or a more extreme one, is called ‘p-value’.
V) Reviewing the thinking process: Read Sections I–IV again and notice that the way we
thought in order to arrive at a conclusion can be summarized in the following steps:
(a) Identifying the research question. (Do female ducks prefer green to plain bread?)
(b) Identifying a quantity related to the research question whose value we don’t know. In this
case the quantity of interest is the probability of a hypothetical female duck picking the green
bread (or the proportion of all female ducks that would pick the green bread). In general, that
quantity is called a ‘parameter’.
(c) Writing the statistical hypotheses in terms of that parameter of interest. In the example the
statistical hypotheses are H0: p = 0.5 and Ha: p > 0.5.
(d) Collecting data and calculating a statistic. (An experiment was conducted and it was
observed that 9 out of 10 ducks preferred the green bread.)
(e) Finding the p-value (probability that the result we got or a more extreme one happens just
by chance, given that the null hypothesis is true).
(f) Deciding whether the p-value is small or large.
This thinking procedure is called ‘hypothesis testing’ and can be applied to many situations in
which a research question is asked and data are collected (through a survey or experiment) in
order to answer the research question. Point (f) is a delicate matter, further discussed in the
next section.
VI) In how many different ways can we make a wrong decision? In hypothesis testing we
need to pick either H0 or Ha. Obviously, we would like to make the correct decision, but we
can sometimes make the wrong decision. How would you describe in words (in terms of what
the ducks prefer and what we say they prefer) each one of these situations?
(1) We select Ha but it is the wrong decision because H0 is true.
(2) We select H0 but it is the wrong decision because H0 is not true.
We call these situations type I error and type II error. We already mentioned that we usually
express our decision in terms of H0 (reject or not reject H0). In the same way we usually focus
on the probability of making type I error (rejecting the null hypothesis when it is true). This is
because the null hypothesis reflects a ‘status quo’ or neutrality situation, and if we reject it,
we are making a statement saying that something is better or preferred, or worse, or different,
depending on the situation. For example, when two medicines are being compared in a
pharmaceutical study, a ‘type I error’ would mean asserting that one medicine is better when
they actually have similar effectiveness. Type I error is usually considered a serious error and
we like to have some control over it. The probability of making type I error is called α (or
‘significance level’). We set the value we want for α at the beginning of a study. A very
common value is 0.05 (5%) but in studies (such as medical research) where the consequences
of type I error are very serious, we like to have a smaller α such as 0.01. If you doubt whether
the p-value is small or not, you can compare it with α in order to decide if it is small or large.
P value more than 0.05 means more than 5 % probability that our results are because of
chances alone in such case accept the null hypotheses (reject the alternate hypotheses) and if
p value is less than 0.05, we accept the alternate hypotheses. The probability of ‘type II error’
is called β.
8.1 TESTS OF SIGNIFICANCE
1) Chi-Square Test
A chi-square test is a statistical test used to compare observed results with expected results. It is
a method of testing the significance of difference between two proportions. The purpose of this
test is to determine if a difference between observed data and expected data is due to chance, or
if it is due to a relationship between the variables you are studying. Therefore, a chi-square test
is an excellent choice to help us better understand and interpret the relationship between our two
categorical variables.
Example:
Fatal MI Non-fatal MI Total
Placebo 18 171 189
Aspirin 5 99 104
23 270 293
Multiply each row total by each column total and divide by the overall total to get the
expected
Placebo 18(14.83) 171(174.16) 189
Aspirin 5(8.16) 99(95.83 104
23 270 293
2) t-test
The t test tells you how significant the differences between groups are; In other words, it lets
you know if those differences (measured in mean) could have happened by chance.
The t-test estimates the true difference between two group means using the ratio of the
difference in group means over the pooled standard error of both groups. You can calculate it
manually using a formula, or use statistical software.
Example: A study was done among undergraduate medical students to compare the IQ levels
between male and female students to find out whether there was any significant
difference or not. Mean IQ level was calculated among both the groups.
Group Mean Standard deviation
Male 96 8.81
Female 97.6 3.69
(Concerned Teacher to demonstrate the use of Open Epi/computer software for t-test using
the above data)
Simple linear regression is a model that assesses the relationship between a continuous
dependent variable and an independent variable. The simple linear model is expressed using
the following equation:
Y = a + bX
Where:
• Y – Dependent variable
• X – Independent (explanatory) variable
• a – Intercept (constant)
• b – regression coefficient
Example: In a study it to find whether BMI has an effect on total blood cholesterol,
regression analysis was done on the data obtained from 150 individuals from the
apparently healthy population within the community. BMI was expressed in terms weight
in Kg per meter squared and cholesterol in terms of mg per decilitre. The regression
coefficient was equal to 6.49 and the estimate of the Y-intercept was equal to 28.07. If we
compare two participants whose BMIs differ by 1 unit, we would expect their total
cholesterols to differ by approximately 6.49 units (with the person with the higher BMI
having the higher total cholesterol).
Using appropriate type of statistical test find out whether there was any significant difference
in the stress level among male and female doctors.
Exercise 8.2 A group of 150 patients with hypertension was divided into two subgroups of 75
patients in each; one of the groups was given a known drug for controlling hypertension
without strict dietary modification while other group was given the same drug with strict
dietary modification for 3 months. The BP of both the groups was measured and daily
readings were recorded, the mean systolic BP was calculated at the end of the 3 months
period. In relation to this scenario solve the following questions:
2a) Which test of significance will be used in this scenario to find out whether the two groups
significantly had different BP at the end of three months?
2b) Using statistical software find out the whether the difference was significant or not; with
the mean BP of first group being 136 mmHg and standard deviation of 9, for second group
the mean BP being 130 mmHg with standard deviation of 6.4.
Exercise 8.3 In a study done among students studying in boarding school, it was found
that weight and height had a linear relationship and each unit increase in height (measured
in m) was related to increase in weight by 2 units (measured in Kg), the constant was
equal to 80. According to this relationship what would be the weight of a student with
height 1.64 m.
BIBLIOGRAPHY:
2. Khanal AB. Mahajan's methods in biostatistics for medical students and research workers.
Jaypee Brothers Medical Publishing. 2015.
3. Rasmussen SA, Goodman RA, editors. The CDC field epidemiology manual. Oxford
University Press. 2018 Nov 20.
4. Campbell MJ. Statistics at square one. John Wiley & Sons; 2021 Jun 16.
Table: Normal curve areas P (z≤z0). Entries in the body of the table are areas between -∞ and z
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
-0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990