0% found this document useful (0 votes)
199 views236 pages

Biostatisticsfor Pharmacy Students

This document is a comprehensive guide on biostatistics tailored for pharmacy students, emphasizing the importance of statistical knowledge in pharmacy practice and patient care. It covers essential statistical concepts, methods, and data collection techniques relevant to pharmaceutical research, including descriptive and inferential statistics. The book aims to enhance the competencies of pharmacy practitioners and students in understanding and applying statistical data to improve healthcare outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views236 pages

Biostatisticsfor Pharmacy Students

This document is a comprehensive guide on biostatistics tailored for pharmacy students, emphasizing the importance of statistical knowledge in pharmacy practice and patient care. It covers essential statistical concepts, methods, and data collection techniques relevant to pharmaceutical research, including descriptive and inferential statistics. The book aims to enhance the competencies of pharmacy practitioners and students in understanding and applying statistical data to improve healthcare outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 236

Authors Publisher

Dr. T.E. Gopalakrishna Murthy Bapatla College of Pharmacy


Dr. K. Rajyalakshmi
Mrs. Ch.Sushma
Mr. B.Sudheer Chowdary

ISBN Number: 978-81-982843-8-9


Preface

Statistics interrelate with various sectors including healthcare. Pharmacy is a field that
relies on science and patient care. As the world transforms, there is a growing need for
evidence based medicine and pharmacists are playing a more important role than before.
A critical aspect of this task is the capacity to understand and apply statistical data to their
maximum potential in order to achieve desired patient results.

This book aims to teach pharmacy practitioners, researchers, and students about the basic
concepts of pharmacy statistics. The emphasis here is on daily pharmacy activities like
designing clinical trials, measuring drug impacts, or examining patient records.
Understanding how to use data is vital to everyday pharmacy practice and that is why it is
essential to possess statistical knowledge.

Statistics is one such area that may seem interesting but complex, especially in the case of
pharmaceutical research. The aim of this text is to make these intriguing concepts easy to
understand. This book tries to construct and STEM pedagogies blend with non-traditional
concepts to foster understanding of how to use statistical concepts to address pharmacy
issues.

In these particular chapters, we will present certain statistical methods which are integral
to activities associated with pharmaceutical research, drug development, clinical studies,
and other pharmacy applicative areas. The provided examples aim to address particular
issues which pharmacy practitioner’s face, giving them a way through the statistical
information that affects their practice.

We strive that this book not only serves as a reference material but also as a motivational
document to enable you to enhance your competencies in understanding and using
statistical data. It is through these methodologies that one would meaningfully support
the field of pharmacy practice, the patient care services, and impact the healthcare system
through significant research activities.
INDEX

S.No. Chapters Page No.

1 Introduction to Biostatistics for Pharmacy Students 01-24

2 Measures of Central Tendency 25-42

3 Measures of Dispersion 43-68

4 Co relation 69-87

5 Regression 88-103

6 Probability 104-146

7 Sampling Techniques ` 147-163

8 Hypothesis Testing 164-174

9 Parametric Tests 177-201

10 Non- Parametric Tests 202-232


Introduction

Biostatistics for Pharmacy Students

1. Introduction

Statistics is one of the prime branches of mathematics with a focus on gathering,


analyzing, interpreting, tabulating, and presenting data. These techniques include
ways of summarizing information, drawing conclusions, and making inferences
regarding populations based on samples. That is, statistics is about transforming raw
data into actionable insights that may help with decision-making and advancing
research. Statistics is one of the fundamental tools in modern pharmacy practice for
evidence-based decisions regarding medication safety and efficacy and ultimately
with regard to healthcare outcomes.
Biostatistics is the application of the technique of statistics to biological, medical, and
health-related research. Biostatistics primarily deals with the design and analysis of
data mainly from experiments, clinical trials, observational studies, and other sources
related to health. Biostatistics is one of the core areas in pattern, relationship, and
trend discovery concerning health data- distribution of disease, effectiveness of drug
treatments, or impact of public health measures. It is crucial to a pharmacy student
because this enables them to critically appraise any research, help them to
comprehend any complex data, and design decisions based on strong evidence.
Mastery of statistical concepts is very essential for the pharmacy students since all
aspects in a modern pharmacy practice, research in pharmacy, as well as development
of drugs, directly rely on it. Biostatistics is also very central to areas such as
epidemiology, genetics, environmental health, clinical medicine, and public health.
The two main branches of statistics are discussed here.
1.1. Descriptive Statistics: This branch focuses on summarizing and organizing data
to describe its main features. Descriptive statistics are essential tools for summarizing
and analyzing data.
It includes tools such as:
 Measures of central tendency (mean, median, mode)
 Measures of variability (range, variance, standard deviation)
 Graphical representations (histograms, bar charts, pie charts)
 Frequency Distribution: How often values occur within a data set.

Bapatla College of Pharmacy Page 1


Introduction

Example: Describing the distribution of patient outcomes (e.g., blood pressure,


cholesterol levels) after a treatment or intervention.
1.2. Inferential Statistics:
This branch involves drawing conclusions or making predictions about a population
based on data sampled from it. It is particularly important to pharmacy practice
because through the use of inferential statistics, healthcare professionals can arrive at
evidence for decisions in relation to clinical trials, experiments, and observational
studies. It's very important in judging the efficacy, safety, and outcomes of medical
treatments and plays a critical role in drug approval decisions by regulatory agencies.
Through the application of inferential statistical techniques, pharmacy practitioners
can ensure that therapies are both safe and effective for patient populations.
Key aspects of inferential statistics include:
 Hypothesis Testing
 Confidence Intervals
 Regression and Correlation Analysis
Hypothesis Testing
Purpose: To evaluate hypotheses or claims about a population based on sample data.
Key Concepts:
1. Null Hypothesis (H₀) vs. Alternative Hypothesis (H₁): Determines whether there
is sufficient evidence to reject the null hypothesis.
2. Types of Errors:
1. Type I Error (False Positive): Incorrectly rejecting the null hypothesis
when it is actually true.
2. Type II Error (False Negative): Failing to reject the null hypothesis
when it is actually false.
3. P-value: The probability of observing results as extreme or more
extreme than those found, assuming the null hypothesis holds true.
4. Confidence Interval (CI): A range of values that is likely to include the
true population parameter with a specified level of confidence
(typically 95%).
Hypothesis testing will be used to determine if a newly discovered medicine is
effective; compare two or more than two treatment options; and does the noted
difference lie in the statistical realm? Different types of data are applied to different

Bapatla College of Pharmacy Page 2


Introduction

statistical tests, for example, categorical data, which would include success or failure
in drug treatment, or continuous, which, for example, could describe the level of drug
concentration.
Common Statistical Tests in Inferential Statistics:
 T-test: Compares the means of two groups (e.g., evaluating the average
cholesterol levels between two groups receiving different treatments).
 ANOVA (Analysis of Variance): Compares means across three or more groups
(e.g., comparing the efficacy of three different drug dosages).
 Chi-Square Test: Analyzes the difference between observed and expected
frequencies in categorical data (e.g., examining if adverse effects are distributed
differently between drug groups).
 Correlation and Regression Analysis: Investigates the relationships between two
or more variables.
Pearson Correlation: Measures the strength and direction of a linear
relationship between two continuous variables.
Linear Regression: Predicts the value of one variable based on another.
Application in Pharmacy: Inferential statistics can be used to determine if a new
medication leads to significant improvements in health outcomes compared to an
existing treatment, or to forecast a patient’s response to therapy based on clinical data.
1.3.Data Collection
Pharmaceutical students often rely on a number of data collection instruments,
designed to suit their purposes and objectives of the research, nature of the study and
resources available. The following are some of the most basic considerations that
pharmaceutical research should be aware of during the entire process of data
collection:
 Ethics Approval: Most research projects, especially if they involve human
subjects, such as clinical trials, surveys, or interviews need to be approved by an
IRB or an ethics committee so that it is clearly in compliance with the main
principles of good practices in doing research.
 Data Integrity: Data needs to be accurate and reliable; this would be irrespective
of if it is quantitative or qualitative data.

Bapatla College of Pharmacy Page 3


Introduction

 Confidentiality and Privacy: Patient privacy must be ensured, and


confidentiality maintained, especially when dealing with sensitive health
information.
 A study must be conducted within the regulatory framework such as FDA,
EMA, and in general the GxP guidelines are always considered when the health
data or pharmaceuticals are kept in line.
The most common types of data collection in pharmaceutical studies are listed below.
1.3.1.Surveys and Questionnaires
 Purpose: To collect both the quantity as well as quality data from a large
population.
 Example: When feeding back the patients' views about a new drug or when
the pharmacists' practices are being monitored in different settings.
 Types:
 Structured Surveys: Predefined questions with fixed response options (e.g.,
Likert scales).
 Unstructured Surveys: Open-ended questions that allow for a much more
personal, detailed response.
1.3.2.Interviews
 Purpose: To collect deeper qualitative information from a smaller and more
focused population.
 Example: Ask health care providers or patients if they have ever experienced a
particular drug or treatment protocol.
 Types:
 Structured Interviews: A set list of questions that all participants answer.
 Semi-structured Interviews: Open-ended questions, but in context and
given the leadership involved it is possible to have more fluid
conversation.
 Unstructured Interviews: A free-form, conversational approach with
minimal predefined structure.
 Tools: Audio recording devices, transcription software (e.g., Otter.ai,
Rev.com).

Bapatla College of Pharmacy Page 4


Introduction

1.3.3. Observational Studies


 Purpose: To obtain actual observations of natural behaviour or events in real
time without interfering with the process.
 Example: Continuing observation of a patient's conduct in a clinical
environment, noting how pharmacy employees handle prescriptions.
 Types:
 Participant Observation: The researcher is a participant in the setting they
intend to observe.
 Non-participant Observation: The researcher remains unobtrusive and
does not take part or disrupt the situation being observed.
 Tools: Field notes, video recordings.
1.3.4. Clinical Trials / Experimental Studies
 Purpose: Collect evidence in a controlled setting to determine the impact of a
drug or treatment
 Example: Testing the efficacy of a new drug or drug formulation.
 Types:
 Randomized Controlled Trials (RCTs): Participants are randomly
assigned to either a treatment group or a control group.
 Cohort Studies: Participants are observed over time based on their
exposure to a particular treatment or drug.
 Tools: Clinical Trial Management Systems (CTMS), Electronic Data Capture
(EDC) systems.
1.3.5.Secondary Data Analysis
 Purpose: Analyzing secondary data instead of collecting fresh data.
 Example: Analyzing published clinical trial outcomes, pharmaceutical sales
figures, or regulatory filings such as FDA reports.
 Sources: PubMed, Google Scholar (for research papers), FDA database,
ClinicalTrials.gov, and market research reports (e.g., IMS Health, Nielsen).
1.3.6.Focus Groups
 Purpose: To obtain qualitative information from a small, heterogeneous
sample on a specific subject.
 Example: Collecting feedback from patients or healthcare providers about
drug packaging, side effects, or medication adherence.

Bapatla College of Pharmacy Page 5


Introduction

 Tools: Recording devices, note-taking, group discussion moderators.


1.3.7.Case Studies
 Purpose: To conduct a thorough examination of individual instances or a small
number of cases (e.g., specific patient scenarios or disease outbreaks).
 Example: Investigating the impact of a particular medication on a patient or
exploring a rare drug interaction.
 Tools: Medical records, patient histories, case report forms.
1.3.8.Laboratory Experiments
 Purpose: To provide correct information through a controlled setting of
laboratory work - mostly it is of a drug formulation, pharmacodynamics, or
pharmacokinetics nature.
 Example: Testing the stability of a drug, its dissolution rate, or cellular
responses to various drug compounds.
 Tools: Laboratory equipment (e.g., spectrophotometers, chromatography
systems), lab notebooks for detailed data recording.
1.3.9.Document Analysis
 Purpose: To gather data from written or electronic documents that may be
relevant to the study.
 Example: Reviewing pharmaceutical regulations, clinical guidelines, or drug
product labels to extract important data.
 Sources: Government reports, scientific journals, manufacturer
documentation.
1.3.10.Online Data Collection (Web Scraping)
 Purpose: To generate significant amounts of data from websites or other
online sources of databases.
 Example: Collecting patient reviews, drug pricing information, or reports of
adverse drug reactions from online platforms.
 Tools: Python (BeautifulSoup, Scrapy), R, and various web scraping tools.
1.3.11. Pharmacovigilance Data Collection
 Purpose: Follow up and gather data on adverse drug reactions (ADRs) and
other safety concerns.
 Example: Collecting reports from healthcare professionals and patients
regarding side effects or drug interactions.

Bapatla College of Pharmacy Page 6


Introduction

 Sources: WHO's Pharmacovigilance Program, FDA's MedWatch, and local


health authorities.
1.3.12. Ethnographic Studies
 Purpose: For the illumination of social and cultural aspects influencing
pharmacy practice or patient behavior.
 Example: Studying how patients from different cultural backgrounds perceive
pharmaceutical treatments or healthcare services.
 Methods: Long-term observation, participant interviews, and immersion
within the community.
1.3.13.Content Analysis
 Purpose: To reveal major ideas or themes that exist within media content, such
as drug labels or advertisements, that are about pharmaceuticals.
 Example: Analyzing how pharmaceutical companies present their products in
advertisements.
 Tools: Coding software like NVivo or MAXQDA.
1.4. Types of Data and Scales of Measurement
In the field of pharmacy, it is crucial to grasp the various types of data and
measurement scales in order to accurately analyze, interpret, and apply research
findings to clinical practice.
1.4.1 Types of Data
Data can be classified into different categories depending on the nature of the
variables and the way they are measured. The three primary types of data in
pharmaceutical research and practice include:
a. Qualitative (Categorical) Data
Qualitative data refers to categories or classifications that are not quantitative in
nature and cannot be measured numerically. This form of data is necessary for
grouping and classifying various characteristics. It may be divided into:
 Nominal Data: The data that could be classified into different categories but
does not have a natural rank order or precedence. Examples in pharmacy are:
 Drug Classification (e.g., analgesics, antibiotics, anti-inflammatories)
 Patient Gender (e.g., male, female, non-binary)
 Blood group type (e.g., A, B, AB, O)

Bapatla College of Pharmacy Page 7


Introduction

Ordinal Data: This is data that not only permits categorization, but also ranks or
places values in order. Still, the rank has no equal interval or constant difference
between any two consecutive ranks. Examples of ordinal data in pharmacy include:
 Severity of Side Effects (e.g., mild, moderate, severe)
 Pain Intensity (e.g., none, mild, moderate, severe)
 Disease Staging (e.g., early-stage, mid-stage, late-stage)
b. Quantitative (Numerical) Data
Quantitative data is data that can be expressed in numerical terms, for measurable
quantities. This type of data is typically analyzed using mathematical and statistical
methods and consists of two principal and typified categories:
Discrete Data: This type of data consists of distinct, separate values, often
representing counts or whole numbers. Examples in pharmacy include:
 Number of Tablets Prescribed
 Frequency of Adverse Events Reported
 Total Medication Doses Administered
Continuous Data: Unlike discrete data, continuous data can take any value within a
specified range and can be subdivided into finer increments. Examples in pharmacy
include:
 Blood Pressure Readings (e.g., 120/80 mmHg)
 Serum Drug Concentration Levels
 Body Weight (e.g., 75.2 kg)
1.4.2. Scales of Measurement
There are four basic scales of measurement, each defining how measurements should
be ranked and the math operations appropriately allowed:
a. Nominal Scale
The nominal scale defines data as being categorised into distinct groups or labels,
without any inherent order or ranking. The values are mere identifiers and convey no
quantitative or ranking sense.
Examples in pharmacy:
 Medication type (e.g., generic vs. brand-name)
 Prescription status (e.g., filled vs. unfilled)
 Medication adherence (e.g., yes/no)

Bapatla College of Pharmacy Page 8


Introduction

b.Ordinal Scale
This scale includes data that can be ranked in a particular order but the intervals may
be inconsistent or cannot be measured.

Examples in pharmacy:
 Pain intensity (e.g., none, mild, moderate, severe)
 Degree of medication adherence (e.g., low, medium, high)
 Classification of side effects (e.g., none, mild, moderate, severe)
c.Interval Scale
The data has equal and meaningful intervals between values but no absolute zero
point. Differences between values are measurable, but values such as twice as much
lack meaning.
Examples in pharmacy:
 Temperature (e.g., Celsius or Fahrenheit) – there is no absolute zero in these
scales.
 Serum drug concentration levels – while the scale is continuous, the absence
of a true zero concentration classifies it as interval data.
a. Ratio Scale
The ratio scale features data with both equal intervals and an absolute zero point,
enabling a full range of mathematical operations, including addition, subtraction,
multiplication, and division.
Examples in pharmacy:
 Body weight (e.g., 70 kg, 75 kg) – a weight of zero indicates the complete
absence of weight.
 Height (e.g., 1.75 m)
 Drug dosage (e.g., 500 mg)
1.5 Data Organization and Presentation
In Organizing and presenting data systematically within pharmaceutical research is
considered a necessary requirement for reproducibility, transparency, and proper
understanding by several stakeholders such as researchers, regulatory authorities, and
practitioners.

Bapatla College of Pharmacy Page 9


Introduction

1.5.1 Data Organization


Good data management is significant in that it helps ensure easy accessibility,
analysis, and interpretability of information. In pharmaceutical research, this would
involve the management of massive datasets created by drug development, clinical
trials, preclinical, pharmacokinetics, and adverse events.
a. Data Management Systems
 Electronic Lab Notebooks (ELNs): Platforms like LabArchives and Labster
allow researchers to store experimental data in a structured, searchable format
for easy retrieval and analysis.
 Database Systems: Structured systems (e.g., SQL databases, data lakes) are
employed to manage vast amounts of clinical trial data, chemical compounds,
biological assays, and patient information.
 Clinical Data Management Systems (CDMS): Specialized systems, such as
Medidata, Oracle, or Veeva Vault, are used to organize and manage clinical
trial data, ensuring alignment with regulatory standards, such as FDA 21 CFR
Part 11.
b. Data Structuring
 Standardized Formats: In clinical research, standardized data formats, for
instance, the ones provided by the Clinical Data Interchange Standards
Consortium (CDISC), will ensure consistency and compliance with regulatory
requirements.
 Metadata: Metadata helps in providing a data context. This entails information
about where the data was derived from, the units of measurement, points in
time, and conditions of sampling.
c. Data Cleaning and Quality Control
 Outlier Detection: Both automated tools and manual reviews help identify
anomalous or erroneous data points (e.g., unrealistic values in clinical trial
results).
 Handling Missing Data: Strategies such as imputation or sensitivity analysis
are implemented to address missing data and assess its impact on study
outcomes.

Bapatla College of Pharmacy Page 10


Introduction

 Audit Trails: Keeping detailed records of all data modifications ensures


traceability and reproducibility, which are essential for meeting regulatory
approval criteria.
d. Data Segmentation
 By Study Phase: Data is often reported based on drug development phase
(preclinical, Phase I-III, post-market, etc.).
 By Outcome: Data is also reported grouped into different kinds of research
outcomes, such as: safety, efficacy, pharmacokinetics, and
pharmacodynamics.
By Subject: Data may be also reported grouped by individual patients or experimental
subjects for one to monitor personal response and treatment.
1.5.2 Data Presentation
After organizing the data effectively, the next crucial step is its presentation. A clear
and well-structured presentation allows stakeholders to quickly comprehend study
findings and make informed, evidence-based decisions.
a. Descriptive Statistics
 Measures of Central Tendency: The measures of central tendency are mean,
median, and mode. These represent summary statistics that summarize the
location of data values in terms of a central point they cluster around.
 Variability (Standard Deviation & Variance): These metrics describe the
dispersion of data points, offering insight into the consistency and reliability of
the results.
 Percentiles & Quartiles: Very useful in clinical data analysis, as they convey
the distribution of data across different segments of a population.
b. Tables
 Clarity and Structure: Tables are well suited for displaying detailed, structured
information, such as patient demographics, baseline characteristics or
pharmacokinetic profiles, for example drug concentrations over time.
 Explanatory Footnotes: Footnotes are applied when abbreviations, units of
measurement, or other details that might require further explanation are
presented.

Bapatla College of Pharmacy Page 11


Introduction

 Subgroup Breakdown: Tables also support the presentation of information


broken down by subgroups, such as through age, gender, or stage of disease,
thus enabling a more profound interpretation.
c. Graphs and Charts
 Bar and Column Graphs: Useful in comparing categorical data: for example,
to determine how various treatment groups are effective.
 Line Graphs: These should be used for demonstrations of trends: for instance,
to track effects that may be associated with a drug treatment for certain
biological markers.
 Box Plots: These graphically present the spread of and central tendency of
data. Box plots are useful in comparing groups whose variables will thus be
distributed differently during clinical trials.
 Survival Curves (Kaplan-Meier): Survival curves are commonly employed in
clinical research to represent time-to-event data, such as those describing
progress-free survival in a cancer clinical trial.
 Scatter Plots: Scatter plots provide a way to understand the nature of the linear
relationship between two continuous variables, analogous to dose-response
analysis.
d. Statistical Analysis and Significance
 P-values & Confidence Intervals: These are statistical techniques important to
evaluate the significance of findings, particularly in clinical trial settings,
where they are employed to confirm the safety and effectiveness of treatments.
 Multivariate Analysis: In complex datasets, multivariate techniques like
regression models and principal component analysis are used for the
examination of relationships between various parameters.
e. Visual Aids
 Heatmaps: To visualize large datasets, such as gene expression data, Heatmaps
can be used to show up patterns or correlations.
 Network Diagrams: Network diagrams can be used in pharmacology to
explain complex molecular interactions, for example, protein-protein
interaction networks.
 Flowcharts: It is very useful when the experimental workflow or patient
recruitment processes in clinical trials can be shown step by step.

Bapatla College of Pharmacy Page 12


Introduction

f. Integrating Text and Figures


 Annotated Graphs: Providing at-least brieft, concise descriptions of graphs and
tables help explain trends as well as results interpretation.
 Legends and Captions: Thoroughly framed figure legends, table captions, and
elaborate annotations enable the reader to comprehend the central message
portrayed by each graphic component.
 Consistency in Terminology: Ensuring consistency in terminology across all
figures, tables, and text enhances clarity and facilitates the interpretation of the
data.
Software Tools for Data Organization and Presentation
Various software tools are widely used in pharmaceutical research for efficient data
management, analysis, and presentation:
 Statistical Software: Programs like R, SAS, SPSS, and STATA are used for
data analysis and statistical modeling.
 Data Visualization Tools: Platforms such as Tableau, GraphPad Prism, and
Microsoft Power BI are designed to create sophisticated visualizations of
research data.
 Clinical Trial Data Analysis: Specialized management and analysis data
platforms include Medidata Rave, Oracle Clinical, and Veeva Vault.
 Chemoinformatics Tools: For the data analysis of both chemical and
biological data, these have adopted the cheminformatics tools in their
software, which may include ChemAxon, Pipeline Pilot, and KNIME.
1.6. Frequency Distribution
A frequency distribution is a statistical tool that need not go amiss in pharmaceutical
research to arrange and summarise data reflecting on how often different values,
ranges of values, actually take place within the data. Frequency distributions play a
very crucial role in providing meaningful insights into the patterns and characteristics
of data. It may be drug concentration distributions in patients, the incidence of side
effects, or even the outcome of a clinical trial.
Purpose of Frequency Distribution in Pharmaceutical Sciences
 Analyzing Variability: Frequency distributions assist in examining the
variability or spread of data points in studies such as clinical trials or
laboratory experiments.

Bapatla College of Pharmacy Page 13


Introduction

 Data Condensation: They simplify large datasets, making it easier to detect


trends, patterns, or anomalies.
 Informed Decision-Making: By consolidating the data, researchers can draw
more informed conclusions, such as determining appropriate drug dosages or
evaluating patient response rates.
Types of Frequency Distributions
Univariate Distribution: A univariate frequency distribution deals with the frequency
of single-variable data. In this case, a single characteristic or variable is being
analyzed. It is useful for analyzing individual variables (e.g., individual drug dosages
or specific side effects).
Cumulative Frequency Distribution: This type displays the total number of data points
less than or equal to a specific value, offering insight into the cumulative effect of
drug doses or side effects over time.
Relative Frequency Distribution: This distribution expresses the occurrence of each
class or category as a percentage of the total dataset, often useful for comparing data
across different sample sizes.
Grouped Frequency Distribution: This distribution is also known as a grouped
frequency distribution) involves organizing data into classes or ranges rather than
individual values, typically used when the data is continuous or large in range. is used
when data is continuous or wide-ranging and is grouped into intervals or classes (e.g.,
blood pressure ranges, tablet weights).
Constructing a Frequency Distribution
In pharmaceutical research, constructing a frequency distribution typically follows
these steps:
 Step 1: Data Collection – Collect the relevant data, such as drug
concentrations, patient responses, or adverse events.
 Step 2: Data Grouping – For continuous data, determine how to group the
values into intervals (e.g., grouping drug concentrations into ranges like 0–10
µg/mL, 10–20 µg/mL, etc.).
 Step 3: Frequency Calculation – For each group or interval, count the number
of data points that fall within that range.
 Step 4: Calculate Additional Metrics (Optional) – Depending on the analysis,
you may calculate cumulative frequency, relative frequency, and percentages.

Bapatla College of Pharmacy Page 14


Introduction

 Step 5: Data Visualization – Visualize the distribution using histograms, bar


charts, or cumulative frequency plots. These visuals help identify trends or
potential skewness, such as whether drug concentrations follow a normal
distribution or exhibit any outliers.
Example 1 (Univariate Distribution): Drug Dosage Frequency Distribution
Suppose a pharmaceutical company is conducting a study on the frequency of dosages
of a particular medication (e.g., ibuprofen). A univariate frequency distribution might
examine the different dosage strengths (e.g., 200 mg, 400 mg, 600 mg, etc.) and the
number of patients taking each dose.
Table 1.1; Drug Concentration Data Observed from 100 Patients.

Dose No. of
Strength(mg) Patients(frequency)
200 10

400 15

600 30

800 20

1000 5

35
30
No. of Patients
(frequency)

25
20
15
10
5
0
200 400 600 800 1000
Dose Strength (mg)

Figure 1.1: Univarient frequency distribution graph describing dose strength


and frequency.
Introduction

Example2: Frequency of Side Effects, in a clinical trial, a pharmaceutical company


may seek to assess the prevalence of a particular side effect (such as nausea) among
patients taking a specific medication. A frequency distribution could be used to
quantify the number of patients who experience nausea at varying levels of intensity
(e.g., mild, moderate, severe). This approach allows for an evaluation of the
proportion of patients affected by the side effect, enabling a better understanding of its
potential risk. The frequency distribution data for this categorical variable is
illustrated in Table 1.2 and Histogram 1.2.
Table 1.2: Side Effect Frequency Data Observed from 100 Patients
Level of Severity No. of Patients
Mild 60
Moderate 30
Severe 10

70

60
No. of Patients

50

40
Mild
30
Moderate
20
Severe
10

0
Mild Moderate Severe
Level of Severity

Fig1.2: Frequency Distribution Graph Describing Side Effect Frequency


Example 3: Analysis of Drug Concentrations
Consider a study in which researchers measure the blood concentration of a new drug
in 100 patients at a specific time point after administration. The measured
concentrations range from 0 to 50 µg/mL. A frequency distribution could be
constructed by categorizing the concentrations into intervals (e.g., 0–10 µg/mL, 10–
20 µg/mL, and so on) and counting how many patients fall into each range. This
process will provide insight into the distribution of drug concentrations across the
Introduction

patient population, revealing whether most patients maintain concentrations within the
therapeutic range. Frequency distribution data for the numerical variable is presented
in table1.3 and hitogram1.3.
Table 1.3; Drug Concentration Data Observed from 100 Patients.
Concentration Range No. of Patients(frequency)
0-10 15
10-20 25
20-35 30
35-40 20
40-50 10

35

30

25
0-10
No. of Patients

20 10-20
20-30
15 30-40
40-50
10

0
0-10 10-20 20-30 30-40 40-50
Concentration Range (µg/ml)

Fig1.3: Frequency Distribution Graph Describing Drug Concentration Analysis


Example 4 (Cumulative Frequency Distribution): The data presented in table 1 can be
rearranged by categorizing the number of patients having concentrations less than
particular class intervals (e.g., 0–10 µg/mL, 10–20 µg/mL, and so on) and counting
how many patients fall into each range. The data is presented in table 1.4.
Introduction

Table 1.4; Cumulative Drug Concentration Data Observed from 100 Patients.

Concentration No. of Patients having concentration less than the


Range selected class interval

0-10 15

10-20 40

20-35 70

35-40 90

40-50 100

120

100
No. of Patients

80

60

40

20

0
0-10 10-20 20-35 35-40 40-50
Concentration Range (µg/ml)

Fig1.4: Cumulative Frequency distribution graph Describing Drug Concentration


Analysis
Example 5 (Percent Frequency Distribution): The data presented in table 1 can be
rearranged by transforming the number of patients having concentrations less than
particular class intervals (e.g., 0–10 µg/mL, 10–20 µg/mL, and so on) into percentage
and counting how many patients fall into each range.
Introduction

Relative frequency represents the proportion of total data points that fall within each
class. To calculate this, divide the frequency of each class by the total number of
observations.

 Total observations = 100


 Relative frequency = (Frequency of class) / (Total observations)

The data is presented in table 1.5.


Concentration Range Relative frequency
0-10 0.15
10-20 0.25
20-35 0.30
35-40 0.20
40-50 0.10

0.35

0.3
Relative Frequency

0.25

0.2

0.15

0.1

0.05

0
0-10 10-20 20-35 35-40 40-50
Concentration Range (µg/ml)

Fig1.5: Relative Frequency distribution graph Describing Drug Concentration


Analysis
Interpreting Frequency Distributions
• Central Tendency: Analyze the mode (the most frequent value) or, when
appropriate, the mean or median to identify the typical or central values within the
dataset.
• Dispersion or Variability: Evaluate the spread of the data by considering the range,
Introduction

variance or standard deviation. This analysis can provide insights into the variability
of drug responses or the occurrence of side effects.
• Distribution Shape: Assess the distribution for normality (bell curve), skewness, or
kurtosis. For instance, a skewed distribution in drug concentration levels may indicate
that a small subset of patients are metabolizing the drug at significantly faster or
slower rates than others.
Applications in Pharmaceutical Research
• Drug Stability Studies: In pharmaceutical stability testing, frequency distributions
can be used to examine the variation in chemical composition or the degradation of a
drug over time, under varying environmental conditions.
• Clinical Trials: Frequency distributions are particularly useful in pharmacokinetic
and pharmacodynamics studies to analyse the spread of drug concentrations, response
rates, or adverse events across different patient cohorts.
• Pharmacovigilance: In Pharmacovigilance, frequency distributions play a crucial
role in tracking the occurrence of adverse drug reactions (ADRs) or side effects,
enabling the identification of emerging trends or outliers in drug safety data.
• Dosage Studies: Frequency distributions also aid in the analysis of how different
dosages of a drug affect different populations and assist in finding appropriate dosage
ranges along with alerting patients at risk of under-or over-dosing.

1.7. Applications of Statistics in Pharmacy

Statistics play a vital role in pharmaceutical research and practice to make sensible
decisions. Some of the primary areas where statistical methods have been applied in
pharmacy include:
1.7.1. Clinical Trials
Clinical trials are fundamental for assessing the safety and efficacy of new drugs and
treatments. Statistical techniques are essential in various stages of trial design and
analysis:
 Study Design: Statistical methodologies are employed to create robust clinical
trial designs, ensuring accurate evaluation of interventions. Key strategies
include randomization, stratification, and blinding, which minimize bias and
enhance the reliability of results. Randomized Controlled Trials (RCTs) are

Bapatla College of Pharmacy Page 20


Introduction

considered the gold standard for clinical trials. Placebo-controlled trials are
conducted to differentiate the therapeutic effect of a drug from a placebo,
while crossover trials involve participants receiving both treatments in
sequence, allowing each to serve as their own control.
 Sample Size Calculation: Prior to conducting trials, statisticians calculate the
required sample size to ensure the study is adequately powered to detect
significant differences between treatment groups.
 Analysis of Results: Various statistical tests (e.g., t-tests, ANOVA, chi-square
tests) are employed to compare results across treatment and control groups.
Advanced techniques, such as regression analysis and survival analysis, are
used for more intricate data, with survival analysis visualized using Kaplan-
Meier curves and the impact of multiple variables assessed using Cox
Proportional Hazards models.
1.7.3. Pharmacokinetics
Pharmacokinetics, which studies the movement of drugs within the body, relies on
statistical models to analyze key relationships:
 Dose-Response Relationships: Statistical models help define how different
dosages of a drug influence its concentration in the body over time. Methods
like compartmental modeling or non-compartmental analysis are commonly
used to analyze pharmacokinetic data.
 Bioequivalence Studies: In the development of generic drugs, statistical
methods (e.g., ANOVA) are applied to compare the pharmacokinetic
parameters (such as Cmax, Tmax, AUC) of a generic drug with its branded
counterpart, ensuring that the generic is therapeutically equivalent.
1.7.4.. Pharmacovigilance
Pharmacovigilance-the safety monitoring of drugs-involves statistical analysis to
identify adverse effects and evaluate drug safety:
 Adverse Drug Reactions (ADR) Reporting: Statisticians analyze ADR reports
to identify trends, evaluate the risks associated with drug side effects, and
monitor overall drug safety. Techniques such as calculating odds ratios and
relative risks, or conducting meta-analyses, are commonly used for this
purpose.

Bapatla College of Pharmacy Page 21


Introduction

 Signal Detection: Statistical methods, such as disproportionality analysis


(including reporting odds ratios), help detect potential safety signals from
databases like the FDA’s Adverse Event Reporting System (FAERS).
1.7.5. Drug Manufacturing and Quality Control
Statistical methods are an important ingredient of quality assurance for the products to
come out with uniformity in drug quality:
 Quality Assurance: Techniques like Statistical Process Control (SPC) are used
to monitor and control manufacturing processes, ensuring that products meet
predefined quality standards and minimizing deviations.
 Validation and Reliability Testing: Statistical methods are used to validate
pharmaceutical equipment, assess product stability, and estimate drug shelf
life. Reliability testing ensures consistent product quality under various
storage conditions.
1.7.6. Epidemiology and Pharmacoeconomics
Statistics enables understanding of drug use patterns and evaluation of cost-
effectiveness in treatments:
 Epidemiological Studies: In pharmacoepidemiology, statistics are used to
explore the patterns, causes, and effects of drug use within populations.
Various study designs, such as cohort studies, case-control studies, and
observational studies, are employed to investigate drug safety and efficacy.
 Pharmacoeconomics: This field uses statistical analysis to assess the cost-
effectiveness of different pharmaceutical treatments. Models such as decision
analysis and cost-effectiveness analysis (CEA) are used to compare the
economic outcomes of various drugs or therapies.
1.7.7. Personalized Medicine
The application of statistical models is crucial for tailoring treatments to individual
patients:
 Pharmacogenomics: Statistical techniques help to examine the relationship
between genetic variations and drug response. By understanding how genetic
factors influence drug metabolism, statisticians support the development of
personalized treatment plans.

Bapatla College of Pharmacy Page 22


Introduction

 Data Mining and Machine Learning: In the era of big data, machine learning
algorithms are used to predict individual patient responses to medications,
considering clinical, demographic, and genetic data.
1.7.8. Drug Formulation
Statistical techniques optimize formulation of drugs and improve their effectiveness:
 Optimization of Drug Formulations: Statistical tools, such as Design of
Experiments (DOE), are used to study drug formulation issues-solubility,
stability, and bioavailability. These statistical tools try to find a most effective
and efficient formulation.
1.7.9. Data Interpretation and Evidence-Based Pharmacy
Evidence-based pharmacy is highly dependent on statistics for application in clinical
decision:
 Relative Risk (RR) and Odds Ratio (OR): These statistical measures help
quantify the relationship between a drug (or exposure) and its associated
outcomes, such as disease incidence.
 Number Needed to Treat (NNT) and Number Needed to Harm (NNH): These
metrics help assess the effectiveness and safety of treatments, guiding
clinicians in determining the best therapeutic approaches.
1.7.10. Meta-Analysis
Meta-analysis is a procedure for combining statistical results from various studies to
enhance robustness and understanding of treatment effectiveness:
 Fixed-Effect vs. Random-Effects Models: These statistical models are used to
combine results from different studies, accounting for variations between
them.
 Forest Plots: These visual tools are used to represent the results of meta-
analyses, offering a clear overview of the effectiveness of a drug or treatment
approach.

Bapatla College of Pharmacy Page 23


Introduction

References
1. Lee CM, Soin HK, Einarson TR. Statistics in the pharmacy literature. Annals
of Pharmacotherapy. 2004;38(9):1412-8.
2. De Muth JE. Basic statistics and pharmaceutical statistical applications. CRC
Press; 2014.
3. Indrayan A, Satyanarayana L. Biostatistics for medical, nursing and pharmacy
students. PHI Learning Pvt. Ltd.; 2006.
4. TONG X, SONG J. Correlation and regression analysis in statistics. J.
Liaoning Econ. Manag. Cadre Inst. 2011;5:17-8.
5. Austin Z, Sutton J. Research methods in pharmacy practice: methods and
applications made easy. Elsevier Health Sciences; 2018.
6. Venkatesh MP. Digital Pharma: How Software Solutions are Shaping the
Pharmaceutical Industry.
7. Baayen RH. Word frequency distributions. Springer Science & Business
Media; 2001.
8. Xu Z, Gautam M, Mehta S. Cumulative frequency fit for particle size
distribution. Applied occupational and environmental hygiene.
2002;17(8):538-42.

Bapatla College of Pharmacy Page 24


Measures of Central Tendency

Measures of central tendency

Central tendencies in statistics refer to the numerical values, such as average or


central values, that serve to represent the middle or central point of a large dataset. A
central or average value in any statistical series is the value that best reflects the entire
dataset or its corresponding frequency distribution. This central value holds
significant importance as it provides insights into the overall nature and
characteristics of the data, which might otherwise be difficult to discern.

In statistics, measures of central tendency are used to condense a dataset by


pinpointing its central value. The most commonly used measures include the mean,
median, and mode. These metrics are essential for understanding the typical or
representative values within a dataset. The selection of the most appropriate measure
of central tendency depends on the type and distribution of the data.

2. 1. Mean (Average)
The mean is calculated by adding all the values together and dividing the sum by the
total number of values. It is the most widely used measure of central tendency and is
particularly effective for data that follows a normal distribution.
Formula:
∑X
N
Where ∑X is the sum of all values, and N is the number of values.
Example: If a group of five individuals has fasting blood glucose levels of 85, 90, 88,
92, and 80, the mean blood glucose level can be determined by:
85 + 90 + 88 + 92 + 80
5
=435/5=87

So, the average score is 87.

A dataset can consist of values from either a sample or a population. A population


encompasses the entire group being studied, while a sample represents a subset of that
population. While sample data can provide useful estimates about the population, only
data from the entire population can offer a full and precise representation. In statistics,

Bapatla College of Pharmacy Page 25


Measures of Central Tendency

the formulas and notation for calculating the sample mean and the population mean
differ. However, the process for determining both means follows a similar approach.

Sample Mean Formula: The sample mean is typically denoted as M or x̄. To calculate
the mean of a sample, the following formula is used:


x=

 x̄: sample mean


 ∑ x: sum of all values in the sample dataset
 n: number of values in the sample dataset

Population mean formula: The population mean is written as μ (Greek term). For
calculating the mean of a population, use this formula:


µ=

 μ: population mean
 ∑ X : sum of all values in the population dataset
 N: number of values in the population dataset

Mean of Grouped Data


The mean of grouped data is the process of calculating the average of data that is
organized into different categories or groups. To determine the mean, a frequency
table is used, which organizes the frequencies of the data, simplifying the calculation.
The three main methods for calculating the mean of grouped data are the direct
method, the assumed mean method, and the step deviation method.
The formula to calculate the mean of grouped data: x̄ = Σfi/N
Where,
 x̄ = the mean value of the set of given data.
 f = frequency of the individual data
 N = sum of frequencies
The steps that can be followed to find the mean for grouped data using the direct
method,
 Create a table containing four columns such as class interval, midpoint of the class
interval, denoted by xi, frequencies fi (corresponding), and xifi

Bapatla College of Pharmacy Page 26


Measures of Central Tendency

 Calculate the midpoint, xi, we use this formula xi = (upper class limit + lower class
limit)/2.
 Calculate Mean by the Formula Mean = ∑xifi / ∑fi. Where fi is the frequency and xi

Example: Find the mean of the following data.

Class interval Frequency


0-10 20
10-20 15
20-30 30
30-40 20
40-50 15
Solution: The first step is to create a table that includes the midpoint and the product
of the frequency and midpoint. To calculate the midpoint, find the average of the class
interval using the formula provided above.

Midpoint xi= 0 - 10 ([10 + 0]/2) = 5 , 10 - 20 ([20 + 10]/2) = 15 and so on.

Class interval Midpoint of Class interval(xi) Frequency(fi) xifi


0-10 5 20 100
10-20 15 15 225
20-30 25 30 750
30-40 35 20 700
40-50 45 15 675
Total ∑fi =100 ∑xifi =2450

Estimated Mean = ∑xifi / ∑fi = 2450/100 = 24.5

2.2. Median
The median is the middle value in a dataset. The median is determined by arranging
all the individual values in a dataset from smallest to largest and the middle value is
identified. If the data set contains an odd number of values, the median is the exact
middle value. If the data set contains an even number of values, the median is
calculated as the average of the two middle values.

Bapatla College of Pharmacy Page 27


Measures of Central Tendency

Example for odd number of ungrouped dataset: If a group of 5 persons has the
following fasting blood glucose levels: 85, 90, 88, 92, and 80, the median blood
glucose levels would be calculated by arranging the blood glucose levels in order
from least to greatest number as:80, 85, 88, 90 and 92.
Since we have an odd number of values, the median is simply the exact middle value
88.
Example for even number of ungrouped dataset: If a group of 6 persons has the
following fasting blood glucose levels: 85, 90, 88, 92,80 and 84, the median blood
glucose levels would be calculated by arranging the blood glucose levels in order
from least to greatest number as:80, 84, 85, 88, 90 and 92.
Since we have an even number of values, the median is the average of the two middle
values, 85 and 88, resulting in a median of 86.5
Median of Grouped Data
To determine the median of grouped data, a systematic approach must be followed.
Grouped data is usually presented in a frequency distribution table, where the data is
organized into classes (or intervals) with their respective frequencies.

Steps to Calculate the Median of Grouped Data:

1. Construct the cumulative frequency table:


o Start by calculating the cumulative frequency (CF) for each class. This

is done by adding the frequency of the current class to the cumulative


frequency of the previous class.
2. Locate the median class:
o The median class is the class where the cumulative frequency first

reaches or exceeds half of the total frequency (N/2), where N


represents the total number of observations (the sum of all
frequencies).
3. Apply the median formula: Use the following formula to calculate the median:

𝑁
− 𝐶𝐹
Median = L + 2 ×ℎ
𝑓

Bapatla College of Pharmacy Page 28


Measures of Central Tendency

Where:

o L = the lower boundary of the median class


o N = the total number of observations (sum of all frequencies)
o CF = the cumulative frequency of the class preceding the median class
o f= the frequency of the median class
o h = the class width (the difference between the upper and lower
boundaries of the class)

Example: Find the median of the following data.


Class interval Frequency
0-10 20
10-20 15
20-30 30
30-40 20
40-50 15

Solution: Find total frequency N=20+15+30+20+15=100


Calculate cumulative frequencies (CF):

Class interval Frequency Cumulative Frequency


0-10 20 20
10-20 15 20+15=35
20-30 30 35+30=65
30-40 20 65+20=85
40-50 15 85+15=100

The cumulative frequency just greater than 50 is 65, which corresponds to the class
interval 20–30. Therefore, the median class is 20–30.

Now apply the median formula:

 Lower boundary (L)=20 (for the class interval 20–30)


 Cumulative frequency (CF)=35 (for the class interval 10–20)
 Frequency (f)=30 (for the class interval 20–30)
 Class width (h)=10 (the difference between 20 and 30)
 Now, substitute these values into the median formula:

Bapatla College of Pharmacy Page 29


Measures of Central Tendency

(50 − 35)
Median = 20 + × 10
30

15
20 + × 10
30

20 + (0.5 × 10) = 25

Median=25

2.3: Mode

The mode is the value that appears most frequently in a dataset. A dataset can have no
mode (if no value repeats), one mode, or multiple modes.

For example, the following dataset has no mode:


80, 85, 88, 90 and 92
The following dataset has one mode: 85. This is the value that appears most
frequently.
80, 85, 88, 85 and 92
The following dataset has two modes: 85, 88. These are the values that occur with the
same highest frequency.
80, 85, 88, 85 and 88
The mode can also be applied to numerical data, as seen in the example of blood
glucose levels above. However, the mode tends to be less useful at answering the
question “Determine a typical value for this dataset?”
The mode may also be an unreliable measure of central tendency if it is a value that
significantly deviates from the other data points. For example, in the dataset below,
the mode is 200, but it doesn’t actually represent the “typical” value of blood glucose
level:
80, 85, 200, 90 and 200
Mode of Grouped Data

While dealing with grouped data (or frequency distributions), the mode refers to the
value or class interval that appears most frequently. For grouped data, the mode can
be determined using a formula, especially when the data is organized in a frequency
distribution table. The formula for calculating the mode assumes that the data is
relatively uniform or symmetrical distribution within the class intervals. This method

Bapatla College of Pharmacy Page 30


Measures of Central Tendency

is suitable for unimodal distributions, but identifying the mode can become more
challenging with multimodal distributions.

Steps to calculate the Mode for Grouped Data:

1. Identify the modal class: The modal class is the class interval with the highest
frequency (the class with the greatest number of observations).
2. Apply the mode formula: After determining the modal class, use the following
formula to calculate the mode:

f1 – f0
Mode = L + ×h
2f1 − f0 − f2

Where:

o L = Lower boundary of the modal class


o f1 = Frequency of the modal class
o f0 = Frequency of the class preceding the modal class
o f2 = Frequency of the class succeeding the modal class
o h = Class width (difference between the upper and lower boundaries of
any class)

Example: Estimate the mode for the following data

Class interval Frequency


0-10 20
10-20 15
20-30 30
30-40 20
40-50 15

Identify the modal class:


The class with the highest frequency is the 20–30 class, with a frequency of 30. So,
the mode class is 20–30.

Apply the formula:

 L=20 (lower boundary of the modal class)


 f1=30 (frequency of the modal class)
 f0=15 (frequency of the class preceding the modal class)
 f2=20 (frequency of the class succeeding the modal class)
 h=10 (class width)

Bapatla College of Pharmacy Page 31


Measures of Central Tendency

Now, apply these values to the mode formula:

30 − 15
Mode = 20 + × 10
(2 × 30) − 15 − 20

20 + × 10

20 + × 10

20 + × 10

20 + (0.6) × 10
20 + 6 = 26
Mode=26
The mode is an especially useful measure of central tendency when dealing with
categorical data, as it reveals the category that appears most frequently.
Example: consider the following bar chart displaying the results of a survey about
people’s preferable dosage form: The data collected from the survey is represented in
table 2.1 and the bar chart in fig 2.1.
Table 2.1: Survey results: what is your preferable dosage form?
Dosage form Frequency
Tablets 20
Capsules 15
Syrup 30
Ointment 10
Injection 5

40
Frequency

30

20

10

0
Tablets Capsules Syrup Ointment Injection
Dosage Form

Fig 2.1: Bar diagram for the Survey results: what is your preferable dosage form?
Measures of Central Tendency

The mode, or the most frequent response, was syrup.


In scenarios where the data is categorical (like the example above), calculating the
median or mean isn't possible, making the mode the only measure of central tendency
that can be applied.
2.4: When to use the mean, median, and mode
The mean calculates the average value of a dataset, the median identifies the middle
value, and the mode determines the most frequently occurring value.
In the case of continuous data with a symmetrical distribution, the mean, median, and
mode will be the same. In such case, analysts typically prefer to use the mean, as it
takes all data points into account. This mean is most useful when the data distribution
is roughly symmetrical and free from outliers.
For example, consider a distribution that represents the systolic blood pressure of
individuals in a particular town.
Table 2.2: Systolic Blood Pressure Distribution Data
Systolic blood pressure (mm Hg) Frequency
100-110 20
110-120 15
120-130 30
130-140 15
140-150 20

35
30
Frequency

25
20
15
10
5
0
100-110 110-120 120-130 130-140 140-150
Systolic blood pressure (mm Hg)

Fig 2.2: Bar diagram for systolic blood pressure distribution


From the above fig, it is observed that this distribution is fairly symmetrical (meaning
that if split down the middle, both half would appear roughly equal) and lacks any
Measures of Central Tendency

outliers (such as extremely high blood pressure values). In such a case, the mean is an
effective measure for summarizing this dataset.
However, when dealing with a skewed distribution, the median is often the most
reliable measure of central tendency. In a right-skewed distribution (where a few large
values exist), the mean will be higher than the median because the large values pull
the mean to the right, while the median remains closer to the center of the distribution.
Conversely in a left-skewed distribution (where a few small values present), the mean
will be lower than the median, as the smaller values drag the mean to the left, but the
median remains near the center of the dataset.

Example:
If you're analyzing the impact of a new drug on blood pressure and most patients
experience a slight reduction in their blood pressure, but a few patients experience a
significant drop (outliers), the mean reduction in blood pressure will be higher than
the median. In this case, the median will more accurately represent the typical
response of most patients.
For example, consider a following distribution that shows the systolic blood pressure
of individuals in a particular town.
Table 2.3: Systolic Blood Pressure Distribution Data
Systolic blood pressure (mm Hg) Frequency
100-110 5
110-120 10
120-130 40
130-140 25
140-150 20

50
40
Frequency

30
20
10
0
100-110 110-120 120-130 130-140 140-150
Systolic blood pressure (mm Hg)

Fig 2.3: Bar diagram for systolic blood pressure distribution


Measures of Central Tendency

The mean for the above data is calculated as follows.


Systolic blood pressure (mm Midpoint of class xifi
Frequency(f)
Hg) interval(xi)
100-110 105 5 525
110-120 115 10 1150
120-130 125 40 5000
130-140 135 25 3375
140-150 145 20 2900
Total 100 12950

∑ 𝒙𝒊𝒇𝒊 𝟏𝟐𝟗𝟓𝟎
Estimated 𝑴𝒆𝒂𝒏 = ∑ 𝒇𝒊
= = 𝟏𝟐𝟗. 𝟓
𝟏𝟎𝟎
The median for the above data is calculated as follows.
Class interval Frequency Cumulative Frequency
100-110 5 5
110-120 10 5+10=15
120-130 40 15+40=55
130-140 25 55+25=80
140-150 20 80+20=100

The median class: The total frequency is N=100. Half of N is 100/2=50. The
cumulative frequency just greater than 50 is 55, which corresponds to the class 120–
130. So, the median class is 120–130.

Apply the median formula as follows:

 Lower boundary (L) = 120 (for the class interval 120–130)


 Cumulative frequency (CF) = 15 (for the class interval 110–120)
 Frequency (f) = 40 (for the class interval 120–130)
 Class width (h) = 10 (the difference between 120 and 130)

(50 − 15)
Median = 120 + × 10
40

35
120 + × 10
40

120 + (0.875 × 10) = 128.75

Bapatla College of Pharmacy Page 35


Measures of Central Tendency

The median provides a more accurate representation of the “typical” value of an


individual compared to the mean. This is because extreme values at the tail end of a
distribution tend to pull the mean away from the center, skewing it toward the long
tail. In cases of positive skewness, the mean tends to be higher than the median.
Additionally, the median is more effective at reflecting the central tendency of a
distribution when outliers are present in the data. For example, consider the table and
chart below, which displays the platelet counts observed in hospitalised patients:
Table 2.4: Platelet Count Distribution Data
Platelet Count Frequency
40000-80000 10
80000-120000 0
120000-160000 0
160000-200000 45
200000-240000 45

50

40
Frequency

40000-80000
30
80000-120000
20
120000-160000
10
160000-200000
0 200000-240000
40000-80000 80000-120000 120000-160000 160000-200000 200000-240000
Platelet Count

Fig 2.4: Bar diagram for platelet count distribution


The mean for the above data is calculated as follows.
Platelet Count (in thousands Midpoint of class interval(xi) Frequency(f) xifi
40-80 60 10 600
80-120 100 0 0
120-160 140 0 0
160-200 180 45 8100
200-240 220 45 9900
Total 100 18600
Measures of Central Tendency

∑ 𝒙𝒊𝒇𝒊 𝟏𝟖𝟔𝟎𝟎
Estimated 𝑴𝒆𝒂𝒏 = ∑ 𝒇𝒊
= = 𝟏𝟖𝟔
𝟏𝟎𝟎
The median for the above data is calculated as follows.
Platelet Count (thousands) Frequency(f) Cumulative frequency
40-80 10 10
80-120 0 10
120-160 0 10
160-200 45 55
200-240 45 100
The median class: The total frequency is N=100. Half of N is 100/2=50. The
cumulative frequency just greater than 50 is 55, which corresponds to the class 160–
200. So, the median class is 160–200.

Apply the median formula, use the following values:

 Lower boundary (L) = 160 (for the class interval 160–200)


 Cumulative frequency (CF) = 10 (for the class interval 120–160)
 Frequency (f) = 45 (for the class interval 160–200)
 Class width (h) = 40 (the difference between 160 and 200)

Now, substitute these values into the median formula:

( )
Median = 160 + × 40

160 + × 40

160 + (0.888 × 40) = 195.55 Thousands=195555

The mean is significantly affected by an extremely low platelet count, whereas the
median remains unaffected. Therefore, the median provides a more accurate
representation of the “typical” platelet count in hospitalised patients.
For ordinal data, the median or mode is generally the more suitable measure of central
tendency, while for categorical data, the mode is the preferred choice. When choosing
between the mean and median, as the better measure of central tendency, which type
of statistical hypothesis test is appropriate for data is also determined. The mean is
typically used in parametric tests, whereas the median is favoured in nonparametric
tests.

Bapatla College of Pharmacy Page 37


Measures of Central Tendency

The following table can be referenced to determine the most appropriate measure of
central tendency for different types of variables:
Table2.5: Preferable Measure of Central Tendency for different variables
Type of Variable Best Suitable Measure of Central Tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median
2.5: Empirical Relation between Measures of Central Tendency

The three measures of central tendency—mean, median, and mode—are closely


related by the following empirical relationship:
2Mean + Mode = 3Median

For example, when tasked with calculating the mean, median, and mode of
continuous grouped data, you can first calculate the mean and median using their
respective formulas, and then determine the mode using this empirical relationship.

Example: The median and mode for a given data set are 26 and 24 respectively. Find
the approximate value of the mean for this data set.
2Mean + Mode = 3Median
2Mean=3Median-Mode
2Mean=3×26-24
2Mean=78-24=54
Mean = 27

2.6: Applications of central tendency in pharmaceutical sciences

1. Drug Efficacy and Clinical Trials


 Mean (Average):
In clinical trials, the mean is commonly used to summarize the overall impact of a
drug on a population. For example, when assessing changes in blood pressure,
calculating the average change in systolic and diastolic values before and after
treatment helps to confirm the drug's general effectiveness.

Bapatla College of Pharmacy Page 38


Measures of Central Tendency

 Median:
When the distribution of drug efficacy responses is skewed, the median is a more
reliable measure of central tendency than the mean. For example, in cancer trials,
examining the median survival time or time to the first response can reduce the
influence of outliers.
 Mode:
The mode is useful when identifying the most frequent outcome is important. For
example, in studies of adverse drug reactions (ADRs), the mode can highlight the
most commonly occurring side effect among patients.
2. Pharmacokinetics (PK) and Pharmacodynamics (PD)
 Mean Drug Concentrations:
In pharmacokinetics, the mean drug concentration is essential for understanding how
a drug behaves in the body. It helps inform dosing regimens and determines the
therapeutic range of drugs.
 Half-life and other PK Parameters:
Measures of central tendency can aid in evaluating the distribution of
pharmacokinetic parameters such as half-life, clearance, and volume of distribution,
helping optimize dosing strategies.
3. Pharmaceutical Quality Control and Manufacturing
 Quality Control Testing:
In pharmaceutical manufacturing, the mean and standard deviation are employed to
assess the consistency and quality of drug products (e.g., tablet weight, dissolution
rate, or active ingredient concentration). Ensuring most samples meet desired
specifications is crucial for regulatory compliance.
 Lot-to-Lot Consistency:
The mean and median can be used to monitor variability between different
manufacturing lots, ensuring batch consistency. Outlier analysis can identify
significant deviations from expected values.
4. Adverse Drug Reactions (ADR) Analysis
 Mode and Frequency:
The mode can help to identify the most frequent ADRs, while the mean and median
summarize the overall occurrence of these reactions in clinical trial populations,
aiding in safety monitoring and post-marketing surveillance.

Bapatla College of Pharmacy Page 39


Measures of Central Tendency

 Toxicity Studies:
In preclinical toxicology studies, the mean dose at which toxicity occurs (e.g., lethal
dose or therapeutic index) is critical for establishing safety profiles of new drugs.
5. Bioequivalence Studies
 Mean Pharmacokinetic Parameters:
In bioequivalence studies, comparing the distribution of pharmacokinetic parameters
(e.g., Cmax, Tmax, AUC) of generic and brand-name drugs ensures that the generic
drug performs similarly in terms of absorption and overall exposure.
 Median:
The median is also useful in comparing the distribution of pharmacokinetic
parameters between the generic and reference drugs, especially when there is
variability in absorption rates.
6. Patient Demographics and Subgroup Analysis
 Mean and Median Demographic Data:
In clinical research, measures of central tendency are used to analyze patient
demographics such as age, weight, gender, etc. ensuring that the trial sample is
representative of the target population. For instance, the mean age of participants can
help assess the alignment of the clinical trial sample with the general population.
 Response by Subgroups:
When analyzing data by subgroups (e.g., age groups, gender, comorbidities), the
mean or median response to a drug can help identify differences in efficacy or side
effects across various population segments.
7. Dosing and Therapeutic Drug Monitoring (TDM)
 Therapeutic Ranges and Dosing:
In therapeutic drug monitoring, the mean or median drug concentration is used to
determine the most suitable dosing regimen to achieve optimal therapeutic levels
while avoiding toxicity.
 Drug Interactions:
When studying drug interactions, measures of central tendency can summarize the
effects of concomitant medications on the mean drug concentration or response in
patients.
8. Pharmaceutical Economics and Cost-Effectiveness Analysis
 Cost per Outcome:

Bapatla College of Pharmacy Page 40


Measures of Central Tendency

In cost-effectiveness analysis, central tendency measures help calculate the average


cost per clinical outcome (e.g., cost per life year saved or cost per adverse event
avoided, aiding in the determination of whether a new drug offers value compared to
existing treatments.
 Cost of Medication:
The mean cost of a drug (both production and retail price) can be analyzed for
economic modelling and reimbursement decisions.
9. Survey Data and Patient Satisfaction
 Patient Satisfaction and Quality of Life:
Surveys assessing patient satisfaction or quality of life following pharmaceutical
treatment often utilize the mean or median of responses to gauge overall satisfaction
or benefit.
10. Data Integrity and Outlier Detection
 Detection of Outliers:
Measures of central tendency help identify outliers in clinical or laboratory data. For
example, extremely high or low values that deviate from the mean may suggest errors,
experimental anomalies, or unique patient responses requiring further investigation.

References:
1. Khorana A, Pareek A, Ollivier M, Madjarova SJ, Kunze KN, Nwachukwu
BU, Karlsson J, Marigi EM, Williams III RJ. Choosing the appropriate
measure of central tendency: mean, median, or mode?. Knee Surgery, Sports
Traumatology, Arthroscopy. 2023 (1):12-5.
2. McGrath S, Zhao X, Steele R, Thombs BD, Benedetti A, DEPRESsion
Screening Data (DEPRESSD) Collaboration. Estimating the sample mean and
standard deviation from commonly reported quantiles in meta-analysis.
Statistical methods in medical research. 2020(9):2520-37.
3. MODE MM. Diagrammatic Representation of the Central Tendency.
STATISTICS AND DATA INTERPRETATION IN EDUCATIONAL
RESEARCH. 2023:130.
4. RAY M, SHARMA H, SINGH U. Statistical methods. Ram Prasad
Publications (RPH); 1994.

Bapatla College of Pharmacy Page 41


Measures of Central Tendency

5. Larson MG. Descriptive statistics and graphical displays. Circulation. 2006 Jul
4;114(1):76-81.
6. Manikandan S. Measures of central tendency: The mean. Journal of
pharmacology & pharmacotherapeutics. 2011;2(2):140.
7. Chakrabarty D. Measuremental Data: Seven Measures of Central Tendency.
International Journal of Electronics. 2021;8(1):15-24.
8. McCluskey A, Lalkhen AG. Statistics II: Central tendency and spread of data.
Continuing Education in Anaesthesia, Critical Care & Pain. 2007;7(4):127-30.
9. Lee CM, Soin HK, Einarson TR. Statistics in the pharmacy literature. Annals
of Pharmacotherapy. 2004 Sep;38(9):1412-8.
10. Austin Z, Sutton J. Research methods in pharmacy practice: methods and
applications made easy. Elsevier Health Sciences; 2018 Feb 21.
11. Rosidah R, Ikram FZ. Measure of central tendency: undergraduate students’
error in decision-making perspective. International Journal of Education.
2021;14(1):39-47.

Bapatla College of Pharmacy Page 42


Measure of Dispersion

3. Measures of Dispersion

In pharmaceutical sciences and research, measures of dispersion are essential tools to


assess the extent of variability or spread within a dataset. Understanding the
fluctuation of data is essential for analyzing experimental outcomes, ensuring the
uniformity of formulations, and evaluating the reliability and precision of
experimental processes. Prominent measures of dispersion frequently utilized in
pharmaceutical research include range, variance, and standard deviation, coefficient
of variation, interquartile range, mean absolute deviation and standard error of mean.

3.1. Range

The range represents the difference between the highest and lowest values in a
dataset. In pharmaceutical research, the range plays a crucial role in identifying
optimal conditions, dosage, and formulations for a drug, as well as evaluating
their safety, efficacy, and behaviour in the body. The range is calculated with the
following formulae.

Range=Maximum value−Minimum value

Steps to calculate range:

1. Identify the maximum value: The largest number in the dataset.


2. Identify the minimum value: The smallest number in the dataset.
3. Subtract the minimum value from the maximum value.

For example, if your dataset is:

6, 15, 20, 8, 3
Maximum value = 20
Minimum value = 3
Then,
Range=20−3=17

This means the range of this dataset is 17.

In pharmaceutical research, the range is utilized to evaluate the dispersion of data,


with several examples outlined below:

Bapatla College of Pharmacy Page 43


Measure of Dispersion

3.1.1. Dosing Range in Clinical Trials

During the initial phases of clinical trials, researchers assess a spectrum of drug
dosages to identify both the minimum effective dose and the maximum tolerated dose.
For example:

 A pharmaceutical company might test a range of dosages (e.g., 5 mg, 10 mg,


20 mg, 50 mg, and 100 mg) of a new drug to assess its safety and efficacy.

3.1.2. Pharmacokinetic (PK) Studies

Pharmacokinetic studies often involve analyzing the range of drug concentrations


over time, which helps in the identification of optimal dosages and dosing intervals.
For example:

 The concentration of a drug in the bloodstream may be measured at different


time intervals to describe pharmacokinetic profile.
 The range of concentration values provides insight into how the drug
behaviour within the body and helps the best administration strategy for
effective treatment.

3.1.3. Stability Testing

The stability of pharmaceutical products is assessed under various environmental


conditions, such as temperature, humidity, and light exposure, to predict how the drug
will perform over time in diverse settings. For example:

 A drug might be stored at different temperatures and humidity levels (e.g.,


according to ICH guidelines) to evaluate its chemical stability and degradation
rate.
 Testing across a range of conditions ensures the drug maintains its efficacy
and safety throughout its shelf life.

Bapatla College of Pharmacy Page 44


Measure of Dispersion

3.1.4. Toxicology Studies

In toxicology studies, a series of drug doses of a new drug are administered by


researchers to animal models to assess toxicity and determine safe dosage thresholds.
For example:

 A study may involve testing different dosages to evaluate the potential for
adverse effects and determine a safe dose for human use.

3.1.5. Efficacy Range in Therapeutic Indications

Pharmaceutical companies frequently test the efficacy range of a drug across various
patient populations, disease severities, or in combination with other therapies. For
example:

 The drug may be tested on patients with varying levels of disease severity (e.g.,
mild, moderate, and severe conditions) to determine its effectiveness across these
subgroups.

3.2. Variance

Variance is a statistical measure that quantifies the average squared deviation of


each data point from the mean, reflecting a measure of how spread out of the data
points is around the mean or dispersion within a dataset.

3.2.1.Estimation of Variance

It is calculated by averaging the squared differences between each data point and the
mean of the dataset.

Formula for Population Variance

For a population with values x1, x2... x N and population mean μ, the variance σ2is:

𝜎 = ∑(𝑥𝑖 − 𝜇)2

Where:

 N is the total number of data points in the population


 xi represents each individual data point
 μ is the population mean

Bapatla College of Pharmacy Page 45


Measure of Dispersion

Formula for Sample Variance

When dealing with a sample (a subset of the population), the variance formula is
modified to account for the possibility that the sample mean may not perfectly
represent the population mean. The sample variance (s2) is given by:

s = ∑(𝑥𝑖 − 𝑥 ) 2

Where:

 n is the number of data points in the sample


 xi represents each individual data point in the sample
 xˉ is the sample mean

The denominator is n−1 (referred to as Bessel's correction) instead of n to minimize


bias when estimating the population variance from a sample. This adjustment ensures
that the sample variance provides an unbiased estimate of the actual population
variance.

Steps for Estimating Variance from a Sample

1. Calculate the sample mean xˉˉ:

1
x = xi
n

2. Compute the squared deviations from the mean for each data point:

(xi−xˉ) 2

3. Sum the squared deviations:

∑(xi−xˉ)2

2. Divide by n−1 to get the sample variance:

𝑠 = ∑(𝑥𝑖 − 𝑥 ) 2

Example

Estimate the variance observed from the disintegration time data of 6 tablets: 2, 4, 6,
8, 10 and 12

Bapatla College of Pharmacy Page 46


Measure of Dispersion

Calculate the sample mean:

xˉ= [2+4+6+8+10 +12]/6=42/6=7

Compute the squared deviations:

(2−7)2=25,(4−7)2=9,(6−7)2=1,(8−7)2=1,(10−7)2=9(12 -7)2 = 25

Sum the squared deviations:

25+9+1+1+9+25=70

Divide by n−1=6−1=5:

s2=70/5=14

So, the sample variance is 14.

3.3.Applications of variance in pharmaceutical research

Variance plays a significant role in pharmaceutical research, providing valuable


insights into the variability of treatment effects, drug formulations, clinical trial
outcomes, and manufacturing processes. By analyzing variance, researchers ensure
that drugs are not only safe and effective but also maintain consistecy across different
populations and conditions. Proper management of variance and understanding
ultimately enhances therapeutic outcomes and supports informed decision-making
throughout drug development and patient care. The following outlines some key
applications of variance in pharmaceutical research.

3.3.1. Clinical Trial Analysis

 Assessing Treatment Variability: In clinical trials, variance is used to assess


the variability of treatment effects across different patient groups. It helps in
understanding whether a new drug produces consistent effect or if responses
differ significantly between individuals.
 Comparing Treatments: Variance is used to compare the efficacy of different
treatments by evaluating the consistency of outcomes. A treatment with lower
variance indicates a more predictable and reliable results.
 Determining Sample Size: The variance in baseline characteristics (such as
age, weight, disease severity) in a clinical trial helps determine the appropriate

Bapatla College of Pharmacy Page 47


Measure of Dispersion

sample size needed to detect a statistically significant difference between


treatment groups.

3.3.2. Drug Formulation and Stability

 Consistency in Drug Production: Variance is used to monitor the consistency


of drug formulations and manufacturing processes, ensuring that each batch of
a drug maintains uniform quality and efficacy.
 Stability Studies: Variance is employed to assess the stability of drug
formulations over time under different storage conditions. A significant
variance in drug potency or quality across different time points could indicate
instability or degradation of the drug.

3.3.3. Pharmacokinetic and Pharmacodynamic Studies

 Analyzing Drug Absorption: In pharmacokinetics (PK) studies, variance helps


evaluate the differences in how different patients absorb, distribute,
metabolize, and eliminate a drug.
 Dose Optimization: Variance in drug responses across different individuals
helps guide dose adjustments, to ensure the drug’s effectiveness while
minimizing adverse effects.
 Modeling Drug Interactions: Variance is used to model drug interactions
between drugs, aiding to understand how the co-administration of multiple
drugs may lead to variations in efficacy or side effects.

3.3.4. Quality Control in Manufacturing

 Control of Drug Potency: Variance is important in monitoring the uniformity


of drug potency in manufacturing. Statistical quality control techniques, such
as control charts, rely on variance to track and ensure drug products
consistently meet the required specifications.
 Monitoring Manufacturing Processes: Variance is used to track other quality
attributes, such as the tablet weight, capsule size, or the amount of active
pharmaceutical ingredient (API) in a formulation, ensuring these parameters
remain within the acceptable limits.

Bapatla College of Pharmacy Page 48


Measure of Dispersion

3.3.5. Epidemiological Studies

 Risk Factor Analysis: In pharmaceutical research, variance helps assess the


spread of health outcomes (such as severity of disease) across different
populations or risk groups. By understanding the variance, researchers can
identify high-risk populations or variables influencing drug efficacy.
 Genetic Variability: Genetic variability within a population may lead to
different drug responses. Variance is crucial in understanding how genetic
factors influence pharmacodynamics and pharmacokinetics of drugs, paving
the way for personalized medicine.

3.3.6. Preclinical Research and Animal Studies

 Drug Response in Animal Models: Variance is used in preclinical studies to


measure variability in responses to treatments in animal models. This helps
predict how a drug might perform in humans, including variability in adverse
effects or efficacy.

3.3.7. Adverse Event Reporting

 Safety Monitoring: Variance helps to analyze the occurrence of adverse events


during clinical trials. By measuring variance in side effects across trial
participants, researchers can identify patterns that may indicate potential risks
associated with a drug.
 Post-Marketing Surveillance: Variance is used in pharmacovigilance to
monitor the variability in adverse events reported after a drug is marketed, as
has been approved and is in general use, assisting in the detection of rare or
unexpected side effects.

3.3.8. Cost-Effectiveness and Health Economics

 Evaluating Variability in Health Outcomes: In health economics, variance is


used to assess the variability in health outcomes, such as quality-adjusted life
years (QALYs) or disease progression, when evaluating the cost-effectiveness
of different treatments.

Bapatla College of Pharmacy Page 49


Measure of Dispersion

 Risk and Uncertainty Analysis: Variance is also used in modeling and


sensitivity analyses to evaluate the risk and uncertainty involved in
pharmaceutical decision-making processes, including pricing, reimbursement,
and healthcare policies.

3.3.9. Personalized Medicine

 Identifying Genetic Variability: Variance is used in pharmacogenomics to


understand how genetic differences contribute to variability in drug responses.
This allows the development of personalized treatment plans based on genetic
profiles.
 Tailoring Dosing Regimens: By understanding the variance in individual drug
responses, researchers can tailor dosing regimens to optimize treatment
efficacy and minimize adverse effects.

3.4. Standard Deviation

Standard deviation is the square root of the variance and provides a measure of the
average distance between each data point and the mean. It is more interpretable than
variance because it is expressed in the same units as the data.

To calculate the standard deviation for a sample or population of ungrouped data, we


can use different formulas depending on the context.

Formula for Population standard deviation

For a population with values x1, x2... x Nand population mean μ, the standard
deviation σ is:

𝜎= ∑(𝑥𝑖 − 𝜇)2

Where:

 N is the number of data points in the population


 xi represents the individual data points
 μ is the population mean

Bapatla College of Pharmacy Page 50


Measure of Dispersion

Formula for Sample standard deviation

When dealing with a sample, the formula for the sample standard deviation (s) is
given by:

2
𝑠= ∑(𝑥𝑖 − 𝑥− )

Where:

 n is the number of data points in the sample


 xi represents the individual sample data points
 𝑥 represents the mean of the sample data points
Estimation Process:

1. Find the mean of the data.


2. Subtract the mean from each data point to find the deviation from the mean for
each value.
3. Square each deviation to eliminate negative values.
4. Sum the squared deviations.
5. Divide by n−1 (for sample) or N (for population) to find the variance.
6. Take the square root of the variance to get the standard deviation.

Estimate the sample standard deviation from the disintegration time data of 6
tablets: 2, 4, 6, 8, 10 and 12

Calculate the sample mean:

xˉ= [2+4+6+8+10 +12]/6=42/6=7

Compute the squared deviations:

(2−7)2=25,(4−7)2=9,(6−7)2=1,(8−7)2=1,(10−7)2=9(12 -7)2 = 25

Sum the squared deviations:

25+9+1+1+9+25=70

Divide by n−1=6−1=5:

s2=70/5=14

s=14=3.74

Bapatla College of Pharmacy Page 51


Measure of Dispersion

So, the sample standard deviation is 3.74.

Standard deviation can also be computed with the following formulae

∑( )
s= ∑(x ) −

Example: Estimate the sample standard deviation from the disintegration time data
of 6 tablets: 2, 4, 6, 8, 10 and 12

x X2
2 4
4 16
6 36
8 64
10 100
12 144
Total 42 364

1 (42)
𝑠= 364 −
5 6

1 1764
𝑠= 364 −
5 6

1
𝑠= [364 − 294]
5

1
𝑠= [70]
5

𝑠 = √14

Bapatla College of Pharmacy Page 52


Measure of Dispersion

=3.74

3.4.1.Estimation of the standard deviation for grouped data

When dealing with grouped data (data that has been organized into intervals or
classes), calculating the standard deviation requires a slightly different methodology
compared to raw data. The step-by-step process to estimate the standard deviation for
grouped data is as follows.

Steps for Estimating standard deviation for grouped data:

1. Determine the midpoint of each class interval:

For each class interval, calculate the midpoint (xi) by averaging the lower and
upper bounds of the interval. The formula to find midpoint is:
lower bound of the class + upper bound of the class
𝑥𝑖 =
2

2. Record the frequency of each class:

Next, use the given frequency distribution to identify how many data points
belong to each class interval. Let fi represent the frequency of the i-th class.

3. Calculate the mean of the distribution:

The mean (xˉ) of the grouped data is calculated using the formula for a
weighted average, where the midpoints and their corresponding frequencies
are used:


x = ∑

Where:

o fi = frequency of the i-th class


o xi = midpoint of the i-th class
4. Calculate the squared deviations: For each class, calculate the squared
deviation of the midpoint from the mean, then multiply this by the frequency:

(xi − x ) × fi

Bapatla College of Pharmacy Page 53


Measure of Dispersion

5. Calculate the variance: The variance is calculated by summing up the


weighted squared deviations and dividing by the total frequency (for sample,
use n−1 ):

∑ ( )
σ = ∑

For sample variance, the denominator would be ∑fi−1.

6. Take the square root of the variance to get the standard deviation:

σ = √σ

Example: Find the standard deviation for the following data

Class interval Frequency


0-10 20
10-20 15
20-30 30
30-40 20
40-50 15

Solution: The above problem can be solved as follows.

Class Midpoint of Class Frequency(f) f xi (xi-x-) (xi-x- fi(xi-x-


interval interval(xi) )2 )2
0-10 5 20 100 -19.5 380.25 7605
10-20 15 15 225 -9.5 90.25 1353.75
20-30 25 30 750 0.5 0.25 7.5
30-40 35 20 700 10.5 110.25 2205
40-50 45 15 675 20.5 420.25 6303.75
Total 100 2450 17475


Calculate the mean: 𝑥 = ∑
=2450/100=24.5

Standard deviation = = √176.51 = 13.28

Bapatla College of Pharmacy Page 54


Measure of Dispersion

Alternatively, the standard deviation can also be computed with the following
formulae.

1 ∑(fxi)
f(xi) −
∑f − 1 ∑f

Class Midpoint of Class Frequency(f) f xi (xi)2 f(xi)2


interval interval(xi)
0-10 5 20 100 25 500
10-20 15 15 225 225 3375
20-30 25 30 750 625 18750
30-40 35 20 700 1225 24500
40-50 45 15 675 2025 30375
Total 100 2450 77500

1 (2450)
77500 −
99 100

1 6002500
77500 −
99 100

1
[[77500 − 60025]
99

1
[[17575]
99

√176.51 =13.28

3.5. Applications of standard deviation in pharmaceutical research

Standard deviation is a versatile and essential tool in pharmaceutical research, helping


ensure drug quality, assess safety and efficacy, and facilitate decision-making in drug
development and clinical practice.

Bapatla College of Pharmacy Page 55


Measure of Dispersion

Below are several applications of standard deviation in pharmaceutical research:

3.5.1. Quality Control in Drug Manufacturing

 Batch Consistency: Standard deviation is used to assess the consistency of


drug products across different manufacturing batches. A low standard
deviation indicates that the batch-to-batch variation is minimal, which is
critical for ensuring uniformity and quality of the final product.
 In-Process Monitoring: During the drug manufacturing process, SD helps
monitor factors such as composition, mixing times, and temperature controls
to ensure consistency in the production process.
 Tablet Weight Uniformity: SD is often used to check the uniformity of tablet
weights in a batch. Regulatory standards require tablets to fall within a certain
weight range with minimal variation to ensure proper dosage.

3.5.2. Clinical Trials and Bioequivalence Studies

 Analyzing Treatment Response Variability: In clinical trials, the SD is used to


measure the variability of responses to a treatment in different patients. For
example, if one group of patients shows a large SD in their response to a drug,
it might suggest differences in how the drug works in different individuals or
populations.
 Comparing Drug Efficacy: SD can help compare the variability in outcomes
(such as blood pressure reduction, blood glucose level changes, etc.) between
treatment groups. A treatment with a smaller SD may indicate that it is more
effective in producing consistent results across participants.

3.5.3. Stability Studies

 Shelf-Life Prediction: Standard deviation is used to analyze stability data to


determine the shelf-life of pharmaceutical products. Variations in
measurements such as potency, dissolution rate, or impurity levels over time
are quantified using SD to ensure the drug maintains efficacy and safety
within the expiration period.
 Storage Conditions Impact: Stability studies under different environmental
conditions (e.g., temperature, humidity) often involve calculating the SD of

Bapatla College of Pharmacy Page 56


Measure of Dispersion

results such as degradation rates or potency loss to assess the robustness of the
product.

3.5.4. Pharmacokinetics (PK) and Pharmacodynamics (PD) Studies

 Variability in Drug Concentration: In pharmacokinetics, SD is used to


quantify the variability in drug concentrations within subjects, helping
researchers understand how different factors (e.g., age, weight, gender) affect
drug metabolism and absorption.
 Pharmacodynamic Response: In PD studies, SD can show how the drug's
effect varies across a population. This helps identify patient subgroups that
may experience more or less pronounced therapeutic effects.

3.5.5. Dose-Response and Safety Assessments

 Determining Optimal Doses: In dose-response studies, SD is used to assess the


variability of patient responses at different dose levels. This information helps
determine the safest and most effective dose by identifying the range of doses
with minimal variability in therapeutic outcomes.

3.5.6. Statistical Analysis of Experimental Data

 Hypothesis Testing: In preclinical and clinical research, standard deviation is


used in statistical tests (such as t-tests, ANOVA, or regression analysis) to
assess the significance of differences between treatment groups. A smaller SD
within a group typically indicates that the sample mean is a more reliable
estimate of the true population mean.
 Power Analysis: SD is used in power analysis to estimate the required sample
size for clinical studies. A larger SD in preliminary data indicates that a larger
sample size may be needed to detect a statistically significant effect.

3.5.7. Toxicology and Safety Studies

 Dose-Response Studies: In toxicology, SD helps assess the variability in


responses to various doses of a potential drug, identifying thresholds for safety
or toxicity. The SD of dose-response data can highlight the consistency of
adverse effects or help pinpoint dose levels that result in adverse events.

Bapatla College of Pharmacy Page 57


Measure of Dispersion

3.5.8. Pharmacogenomics

 Genetic Variability in Drug Response: SD is useful in pharmacogenomic


studies to assess how genetic differences among individuals contribute to
variability in drug response. Understanding the SD of responses in genetically
diverse populations helps identify biomarkers for personalized medicine.
 Evaluating Genetic Variants: Researchers use SD to quantify the variability in
how individuals with different genetic profiles metabolize or respond to
medications, leading to more targeted treatments.

3.5. Coefficient of Variation (CV)

The coefficient of variation (CV) is a statistical measure of the relative variability


of data, defined as the ratio of the standard deviation to the mean, expressed as a
percentage. It is widely used in pharmaceutical research for assessing and
comparing variability across different datasets, experiments, or treatment groups.
The coefficient of variation is a versatile tool in pharmaceutical research that helps
assess variability in various contexts, from manufacturing processes and drug
formulation development to clinical trials and bioequivalence studies.

Estimation of CV

To estimate the Coefficient of Variation in a data set, follow these steps:

1. Calculate the Mean (µ):

μ = ∑ xi

Where xi is each individual observation, and n is the number of observations.

2. Calculate the Standard Deviation (σ):

𝜎 = ∑(𝑥𝑖 − 𝜇)

3. Calculate the Coefficient of Variation (CV):

𝐶𝑉 = × 100

Bapatla College of Pharmacy Page 58


Measure of Dispersion

Where:

o σ = standard deviation
o μ = mean of the dataset

CV provides an essential metric for ensuring the reliability, safety, and


effectiveness of pharmaceutical products.

3.5.1. Advantages of Using CV in Pharmaceutical Research

 Comparability across different datasets: Since CV is a relative measure, it


allows comparison of variability across datasets with different units or scales
(e.g., comparing different drugs with different potencies).
 Sensitivity to changes in variability: CV provides a clear measure of how
consistent or variable a process is, which is critical in the pharmaceutical
industry, where precision is paramount.
 Simple and intuitive: CV is easy to compute and interpret, making it an
attractive tool for researchers and quality control personnel in pharmaceutical
companies.

3.5.2. Limitations of CV

 Sensitivity to small mean values: When the mean of a dataset is very small,
the CV can become excessively large, even if the standard deviation is not
large, which might lead to misinterpretation of variability.
 Not suitable for skewed distributions: CV assumes that the data follows a
normal distribution. For datasets that are highly skewed, the CV may not
provide an accurarte representation of variability.
 Influence of extreme values: Extreme outliers can have a significant impact on
the CV, leading to distorted results. In such case data cleaning or
transformation methods may be required to reduce the influence of these
outliers.

3.5.3. Interquartile Range (IQR)

The Interquartile Range (IQR) is a measure of statistical dispersion that quantifies


the spread of the middle 50% of data in a dataset. It is calculated as the difference
between the third quartile (Q3) and the first quartile (Q1).

Bapatla College of Pharmacy Page 59


Measure of Dispersion

Formula: IQR=Q3−Q1

Where:

 Q1 (25th percentile) is the value below which 25% of the data falls.
 Q3 (75th percentile) is the value below which 75% of the data falls.

To estimate the IQR for a given dataset, follow these steps:

1. Arrange the Data: Sort the data in ascending order.


2. Find Q1 and Q3:
o Q1: The median of the lower half of the dataset (i.e., the median of
values below the overall median).
o Q3: The median of the upper half of the dataset (i.e., the median of
values above the overall median).
3. Calculate the IQR: Subtract Q1 from Q3.

Example: A pharmaceutical company is conducting a study to measure the


concentration of a new drug in the bloodstream of patients after a single dose. The
company collected the data on drug concentration (in ng/mL) from 10 patients at 1
hour after administration. The data observed is furnished below.
20, 34, 28, 42, 32, 36, 50, 48, 40, 30

Steps to Calculate the IQR:

1. Sort the Data in Ascending Order:

20,28,30,32,34,36,40,42,48,50

2. Find the Median (Q2): The median is the middle value of the sorted dataset. Since
there are 10 data points (an even number), the median is the average of the 5th and 6th
values.

3. Find Q1 (First Quartile): The first quartile (Q1) is the median of the lower half of
the dataset (values below the overall median).

 Lower half: 20, 28, 30, 32, 34


 The median of this lower half is the 3rd value: Q1 = 30.

Bapatla College of Pharmacy Page 60


Measure of Dispersion

4. Find Q3 (Third Quartile): The third quartile (Q3) is the median of the upper half of
the dataset (values above the overall median).

 Upper half: 36, 40, 42, 48, 50


 The median of this upper half is the 3rd value: Q3 = 42.

5. Calculate the IQR: Now that we have Q1 and Q3, we can calculate the
Interquartile Range (IQR):

IQR=Q3−Q1=42−30= 12

3.5.4.Interpretation of the IQR:

 The IQR of 12 ng/mL tells us that the middle 50% of the drug concentrations
in this sample fall within a range of 12 ng/mL (from 30 ng/mL to 42 ng/mL).
 A smaller IQR suggests that the data is more consistent, while a larger IQR
indicates more variability in the concentrations across patients.

3.5.5.Advantages of Using IQR in Pharmaceutical Research

 Resilience to Outliers: IQR is less sensitive to outliers, making it a robust and


reliable metric for measuring variability in data sets that may have extreme
values.
 Data Distribution Understanding: The IQR helps researchers understand the
spread of data within the middle 50%, providing valuable information about
the consistency and reliability of experimental results.
 Improved Decision-Making: By quantifying the dispersion of data, the IQR
helps researchers make better-informed decisions regarding dosing,
formulation, and clinical trial design.

3.5.6.. Limitations

 Does Not Reflect Entire Distribution: Although the IQR is useful for
understanding the spread of the central data, it doesn't account for the
behaviour of the entire dataset, especially the extremes (very high or low
values).

Bapatla College of Pharmacy Page 61


Measure of Dispersion

 Requires Proper Data Handling: In cases where the data has many missing
values or is skewed, care must be taken when calculating IQR to ensure
accurate results.

3.6. Mean Absolute Deviation (MAD)

The mean absolute deviation is a less sensitive measure to outliers than variance
or standard deviation. Mean Absolute Deviation (MAD) is a statistical measure
used to quantify the dispersion or spread of a dataset. It represents the average of
the absolute deviations from the mean of a dataset. This measure is particularly
valuable for researchers or analysts who are interested in understanding the
variability or consistency of data, as it is less sensitive to extreme values (outliers)
than other measures like variance or standard deviation.

The formula for calculating the Mean Absolute Deviation is:

𝑀𝐴𝐷 = ∑|𝑥𝑖 − 𝜇|

Where:

 xi = each data point in the dataset


 μ = mean of the dataset
 n = number of data points

Example: A pharmaceutical company tested the potency of a new drug in 5 samples


from a batch, and the measured potencies (in mg) are:

Potencies= 99, 100, 101, 98, 102


Step 1: Calculate the mean potency:
μ=[99+100+101+98+102]/5=500/5=100

2. Step 2: Calculate the absolute deviations from the mean:

∣99−100∣=1,∣100−100∣=0.,∣101−100∣=1,∣98.0−100∣=2,∣102−100∣=2

Step 3: Calculate the MAD:


MAD=1+0+1+2+2/5=6/5=1.2 mg

Bapatla College of Pharmacy Page 62


Measure of Dispersion

This MAD value of 1.2 mg indicates that the deviation of each sample's potency from
the mean is, on average, 1.2 mg. If this MAD value were very high, it could suggest
variability in the production process, requiring further investigation.

3.6.1. Advantages of using MAD

 Simplicity: MAD is easy to calculate and understand.


 Robustness: MAD is less sensitive to outliers, making it useful when the
dataset contains extreme values.
 Interpretability: Unlike variance or standard deviations, which are based on
squared deviations, MAD provides a direct measure of the average magnitude
of deviation from the mean.

3.7. Standard error of mean (SEM):

The Standard Error of the Mean (SEM) is a statistical measure that quantifies the
amount of variability or dispersion of sample means around the population mean. It
provides an estimate of how much the sample mean (average) of a dataset is likely to
differ from the true population mean.

Formula for SEM

The SEM is calculated using the formula:

SEM=

Where

 σ is the standard deviation of the sample (or population).


 n is the sample size.

3.7.1. Factors influencing SEM

1. Sample Size: Larger sample sizes generally lead to a smaller SEM. This is
because increasing the number of observations in the sample reduces the error
in estimating the true population mean.
2. Standard Deviation: The SEM is directly affected by the variability in the data.
Higher variability (larger σ) leads to a larger SEM.

Bapatla College of Pharmacy Page 63


Measure of Dispersion

Interpretation: The SEM reflects how precise the sample mean is as an estimate of
the population mean. A smaller SEM means that the sample mean is a more
accurate reflection of the population mean.

3.7.2.Sample Calculation of the Standard Error of the Mean (SEM)

A pharmaceutical scientist measured the concentration of a drug in 5 different tablet


samples from a batch. The drug concentrations (in mg per tablet) are as follows:

 Sample 1: 35 mg
 Sample 2: 38 mg
 Sample 3: 40 mg
 Sample 4: 37 mg
 Sample 5: 42 mg

Calculate the Standard Error of the Mean (SEM) for the sample.

Step 1: Calculate the mean of the sample

The first step is to find the mean (xˉ)/n) of the sample. This is done by adding all the
sample values and dividing by the number of samples.

xˉ=(35+38+40+37+42)/5

=192/5=38.4

So, the mean drug concentration is 38.4 mg.

Step 2: Calculate the Standard Deviation (SD) of the Sample

Next, we need to calculate the sample standard deviation (SD). This step involves
finding how much each sample value deviates from the mean, squaring the deviations,
and then averaging them.

Step 2.1: Calculate the deviations from the mean

For each sample, subtract the mean (46 mg) from the sample value:

 Sample 1: 35−38.4=−3.4
 Sample 2: 38−38.4=−0.4
 Sample 3: 40−38.4=1.6
 Sample 4: 37−38.4=−1.4

Bapatla College of Pharmacy Page 64


Measure of Dispersion

 Sample 5: 42−38.4=3.6

Step 2.2: Square the deviations

Now square each of the deviations:

 (−3.4)2=11.56
 (−0.4)2=0.16
 (1.6)2=2.56
 (−1.4)2=1.96
 (3.6)2=12.96

Step 2.3: Calculate the variance

The variance is the average of the squared deviations. For a sample, we divide the
sum of squared deviations by n−1, where n is the sample size (in this case, 5).

Variance= (11.56+0.16+2.56+1.96+12.96)/5 = 29.2/4=7.3

Variance=7.3

Step 2.4: Calculate the standard deviation

The standard deviation is the square root of the variance:

SD=7.3=2.7 mg

Step 2.5: Calculate the Standard Error of the Mean (SEM)

SEM=

.
= =2.7/2.236=1.21

This means that the sample mean (38.4 mg) has an associated SEM of 1.21 mg,
indicating that the sample mean is likely to be within this range of the true population
mean based on this sample size and variability.

3.8. Pharmaceutical Applications of SEM

In the pharmaceutical industry, the SEM is widely used in various stages of drug
development, quality control, and clinical trials. Here are some key applications:

Bapatla College of Pharmacy Page 65


Measure of Dispersion

3.8.1. Clinical Trials:

 Assessing treatment effectiveness: In clinical trials, SEM is used to assess the


precision of the average treatment effect. It provides insight into how much
variability there is in the treatment outcomes. For example, if the SEM is
small, it suggests that the sample mean is close to the true population mean,
indicating a reliable result.
 Confidence intervals: SEM is used to calculate confidence intervals (CIs) for
the mean, offering a range of values within which the true population mean is
likely to lie.

3.8.2. Pharmaceutical Product Development:

 Stability studies: In stability studies, SEM helps in assessing the consistency


and reliability of drug potency over time. For instance, if a drug's potency
measurements from different batches show low SEM, it suggests that the
formulation is stable and consistently effective.
 Formulation and dosage consistency: SEM is used to evaluate the consistency
of the drug's formulation or dosage forms. If the SEM is large, it indicates that
there is substantial variability in the formulation.

3.8.3. Pharmacokinetics and Pharmacodynamics:

 Bioavailability studies: SEM is important in bioequivalence studies, where the


bioavailability of a generic drug is compared to a brand-name drug. A small
SEM in pharmacokinetic parameters (like Cmax, Tmax, or AUC) indicates
that the generic drug performs similarly to the brand-name drug, with less
variability across test subjects.
 Population variability: SEM can be used to quantify population variability in
drug metabolism. In pharmacodynamics, understanding the SEM helps assess
how different patients or subgroups might respond to the same treatment,
facilitating more personalized dosing.

Bapatla College of Pharmacy Page 66


Measure of Dispersion

3.8.4. Quality Control in Manufacturing:

 Assay validation: During manufacturing, the SEM is used to monitor the


precision of assays that measure drug content, ensuring that the concentration
of active pharmaceutical ingredients (APIs) in pharmaceutical dosage forms
are consistent. If the SEM is too large, it suggests that the manufacturing
process needs to be adjusted for better consistency.
 Batch-to-Batch variation: SEM can also be used to assess batch-to-batch
variation in drug production. If SEM values are high across multiple batches,
it may signal potential issues in the manufacturing process, prompting further
investigation into equipment calibration, ingredient sourcing, or formulation.

3.8.5. Safety and Adverse Event Reporting:

 Identifying outliers and variability: SEM can help identify outliers or unusual
data points in clinical safety trials. By assessing how much variation exists in
the reporting of adverse events, researchers can better determine whether the
events are truly rare or just a result of statistical noise.
 Risk assessment: SEM can be part of risk assessment tools that help predict
the likelihood of adverse reactions in a population. A high SEM in adverse
event data suggests a high level of uncertainty about the generalizability of
safety results, prompting more rigorous monitoring.

3.8.6. Dose-Response Relationships:

 Modeling drug efficacy: In preclinical and clinical research, SEM is used in


the modeling of dose-response relationships to determine the optimal dose that
balances efficacy and safety. Smaller SEM values indicate that the mean
response at a given dose is a more reliable predictor of the therapeutic effect.

3.8.7. Regulatory Submissions:

 Statistical Significance in drug approval: Regulatory agencies often require


that clinical trial results meet certain statistical thresholds (e.g., p-values and
confidence intervals) to demonstrate efficacy. SEM is integral in calculating
these metrics and, ultimately, the approval of new drug products.

Bapatla College of Pharmacy Page 67


Measure of Dispersion

References:

1. Bhardwaj AN, Sharma K. Comparative study of various measures of


dispersion. Journal of Advances in mathematics. 2013;1(1).

2. Rayat CS, Rayat CS. Measures of dispersion. Statistical Methods in Medical


Research. 2018:47-60.

3. Wooditch A, Johnson NJ, Solymosi R, Medina Ariza J, Langton S. Measures


of dispersion. InA Beginner’s Guide to Statistics for Criminology and
Criminal Justice Using R 2021 Jun 4 (pp. 77-88). Cham: Springer
International Publishing.

4. Bickel PJ, Lehmann EL. Descriptive statistics for nonparametric models. III.
Dispersion. InSelected works of EL Lehmann 2011 (pp. 499-518). Boston,
MA: Springer US.

5. Ali Z, Bhaskar SB, Sudheesh K. Descriptive statistics: Measures of central


tendency, dispersion, correlation and regression. Airway. 2019;2(3):120-5.

6. De Muth JE. Basic statistics and pharmaceutical statistical applications. CRC


Press; 2014 Apr 28.

7. Bolton S, Bon C. Pharmaceutical statistics: Practical and clinical applications,


revised and expanded. CRC press; 2003.

8. Wessels P, Holz M, Erni F, Krummen K, Ogorka J. Statistical evaluation of


stability data of pharmaceutical products for specification setting. Drug
development and industrial pharmacy. 1997;23(5):427-39.

9. Shikano S, Bräuninger T, Stoffel M. Statistical analysis of experimental data.


InExperimental Political Science: Principles and Practices 2012 (pp. 163-177).
London: Palgrave Macmillan UK.

10. Mason RL, Gunst RF, Hess JL. Statistical design and analysis of experiments:
with applications to engineering and science. John Wiley & Sons; 2003;25.

Bapatla College of Pharmacy Page 68


Correlation

4. Correlation

Correlation refers to a statistical connection or interdependence between two or more


variables. It measures the strength and direction of the linear relationship between
them. When two variables fluctuate in a consistent andpredictable manner, they are
said to be correlated. Correlation assesses how two variables interact, either in the
same direction, opposite directions, or exhibit no relationship at all. It is aessential
tool in disciplines like pharmaceuticals and health sciences,and is commonly
measured using a correlation coefficient.

4.1. Correlation Types

4.1.1. Positive Correlation (PC):

In a positive correlation,both variables move in same direction, either increasing or


decreasing together.For instance, the concentration of a drug in a patient's
bloodstream may correlate with its therapeutic effect,helping to determine the ideal
dosage for both efficacy and safety.A drug’s longer half-life could be positively
correlated with itssustainable therapeutic effect.In controlled drug delivery systems,
positive correlationscan be observed between the percentage of drug released and
time in invitro release profiles. Similarly, in drug analysis, a positive correlation often
exists between drug concentration and measurable response, such as absorbance or
peak area.

4.1.2. Negative Correlation:

A negative correlation occurs when one variable increaseswhilethe other decreases.


For example, as a person ages, theirdrug metabolism may slow down, leading
toelevated drug concentrations in the body for longer durations. Negative
correlationscan also beseen between the potency of a drug over timeand in data such
as log plasma concentration versestime, for data collected from post administration
plasma concentration levels after an intravenous injection.

Bapatla College of Pharmacy Page 69


Correlation

4.1.3. Zero or No Correlation:

Whentwo variables show no consistent relationship, they are said to have zero or no
correlation. In such cases, knowing one variable’s value does not provide any insight
into the other.For example, there may be no correlation between the colour of a
pharmaceutical pill and the incidence of side effects reported by patients, or between
the month of the year and the drug’s efficacy.

4.1.4. Partial Correlation:

Partial correlationanalyzes the relationship between two variables while controlling


for the effect of one or more additional variables. This methodhelps to isolate the
direct connection between the two variables, free from the confounding effects of
others. In the field of pharmaceuticals, partial correlationis useful for investigating the
relationship between two variables while accounting for the impact of a third variable.

For example, ifresearchersaim to evaluate whether medication dosage (X) affectsBP


reduction (Y) while controlling for the patients' age (Z), which could affectboth
variables, partition correlation enables them to examinethe direct effect of dosage on
BP. Both dosage and BP reduction are likely to be correlated, as higher medication
doses may lead to greater reductions in BP. However, age could also be correlated
with both older patients might be prescribed altered dosages and older patients may
respond differently to medication. To understand the direct relationship between dose
andBP reduction while controlling for the effect of age, partial correlation can be
used. This method calculates the correlation between X and Y after removing the
variance explained by Z. If the partial correlation remains strong after accounting for
age, it indicates that the dosage has a direct influence on BP reduction, independent of
the patients' age. If the partial correlation is weak or negligible, wouldsuggest that age
mediates the relationship between dosage and BP reduction.

4.1.5. Causal Correlation:

Causal Correlation refers to a situationin which one variable directly influences


changes in another. However, it's important to understand that correlation does not
necessarily imply causation. Just because two variables are correlated does not

Bapatla College of Pharmacy Page 70


Correlation

necessarily mean one is causing the other.In the context of the pharmaceutical
industry, causal correlation refers to a relationship where one variable directly impacts
another. For example, in a clinical trial, a pharmaceutical company may develop a
drugintended to lowerBP. If patients who take the drug exhibit a significant reduction
in BP compared to those who receive a placebo, the medication is considered the
cause (independent variable),while the reduction in BP is the effect (dependent
variable).

4.1.6. Multiple correlations:

Multiple correlationsrefer to the relationship between a dependent variable and


multiple independent variables. This analytical method is frequentlyemployed to
understand how different factors contribute to or influence a specific outcome or
behaviour.

The Multiple correlation coefficient (symbolised as R) measures the strength and


direction of relationship between a dependent variable and multiple independent
variables. The value of R ranges between 0 to 1:

 R=1: Represents a perfect positive linear correlation.


 R=0: Implies no linear relationship exists between the variables.
 R<1: Suggests a partial correlation, with some unexplained variance.

Formula for Multiple Correlation Coefficient (R):

Given a set of independent variables (predictors) X1, X2,…,Xn and a dependent


variable Y, the multiple correlation coefficient is derived from the square root of the
coefficient of determinationR2, which reflects the proportion of variance in the
dependent variableexplained by the independent variables.

Explained variance
𝑅 =
Total variance

 R2represents the proportion of variation in the dependent variable Y that can


be explained by the independent variables X1,X2,…,Xn. Consequently, the
multiple correlation coefficient R is calculated as:

Bapatla College of Pharmacy Page 71


Correlation

𝑅= 𝑅

A typical example in the pharmaceutical industry involvesassessing how


variousfactors (such as dosage, age, and gender) influence the efficacy of a drug.

Example:Evaluating the effectiveness of a BP - loweringdrug

For instance, consider a pharmaceutical company testing a new drug


aimedatreducingBP. The company seeks to understand how different variables —such
as the drug dosage, patient’sage, and theirgender—influence the drug’s effectiveness,
which is measured by the degree of reductionin BP.

 Dependent variable (response): Reduction in BP (e.g., measured in mmHg).


 Independent variables (predictors):
o Drug dosage (measured in mg per day).
o Patient age (measured in years).
o Patient gender (male or female).

To evaluatethe combined impact of these factors on the drug's ability to reduce BP,
the company could perform amultiple correlation analysis, typically through multiple
linear regression techniques.

For example, the analysis may reveal the following:

1. The drug dosageshows a positive correlation with the BPreduction, indicating


that higher doses are more effective.
2. Age might exhibit a negative correlation, suggesting that older patients
experience less of a reduction inBPcompared toyounger patients, perhaps due
to additional health complications.
3. Gender could reveal that the drug is more effective in one gender than the
other, potentially because of biological differences.

The multiple correlation coefficients (R) would provide a comprehensive measure of


how well these three variables, in combination, predict the reduction in BP.

Bapatla College of Pharmacy Page 72


Correlation

4.1.7. Hypothetical Data Model:

Assume that the relationship between these variables is represented by the following
multiple linear regression equation:

Blood Pressure Reduction=β0+β1 (Dose) +β2 (Age) +β3 (Gender) +ϵ

Where:

 β0 implies the intercept,


 β1, β2, and β3 are coefficients that reflect the impact of each independent
variable,
 ϵ denotes the error term.

Interpretation of Results:

 The multiple correlation coefficients (R) indicates the degreeto which the
combination of dose, age, and gender explains the variation in BP reduction.
 The R-squared (R²) value revealsthe proportion of variance in BP reduction is
explained by the independent variables (dosage, age, and gender).

In this example, multiple correlation analysisallows researchers to evaluatethe


combined influence of several variables on the outcome (BP reduction) helping them
refine dosage recommendations or tailor treatment strategies for various patient
groups.

4.2: Karl Pearson Correlation Coefficient

Often referred to as the Pearson correlation coefficient (PCC), the Karl Pearson
Correlation is a statistical measurethat evaluates the strength and direction of the
linear association between two variables. It is denoted by r, with value
rangingbetween-1 to +1.

 A value of +1 indicates a perfect positive linear relationship.


 A value of -1signifies a perfect negative linear relationship.

Bapatla College of Pharmacy Page 73


Correlation

 A value of 0suggests that there isabsence any linear relationship between two
variables.

Formula:

The formula for Pearson correlation coefficient is:

∑(xi − x¯)(yi − y¯)


𝑟=
∑(xi − x¯) ∑(yi − y¯)

Where:

 xiandyi denoteeach sample values for two variable x and y.


 xˉ and yˉ implies mean (average) values of the x and y variables.
 ∑ indicates the summation across all data points.

Steps to Calculate the PCC:

1. Calculate the mean of two variables xˉ and yˉ.


2. Calculate the deviationsby subtracting the mean from each data point for both
variables (xi−xiˉ) and (yi−yiˉ).
3. Multiply the deviations of each paired data pointfrom the two variables xi
andyi(i.e., (xi−xˉ)(yi−yˉ).
4. Sum the products obtained in step 3.
5. Square the deviations for each variable seperately, then sum these squared
values..
6. Calculate the ratio of sum of the products of deviations to the square root of
the product of the sums of squared deviations for two variables.

Bapatla College of Pharmacy Page 74


Correlation

Interpretation of the PCC:

Correlation Degree

CorrelationDegree Positive Correlation Negative Correlation


Perfect Correlation +1 -1
Very Extreme Degree of -0.9 -0.9
Correlation
Fairly extreme Degree of Between +0.75 and +0.9 Between -0.75 and -0.9
Correlation
Moderate Degree of Between +0.25 and Between -0.25 and -0.75
Correlation +0.75
Poor Degree of Correlation Between 0 and +0.25 Between 0 and -0.25
Negligible/No Correlation 0 0

Problem:

Consider the following example where we data is given on the time and the
percentage of drug dissolved

Time(hr) % drug dissolved


2 21
4 43
6 62
8 86
10 98

We aim to find the PCC between time and percent drug dissolved.

Step-by-Step Calculation:

1. Calculate the means of time and percent drug release:


( )
o Mean of time 𝑥 = =6

Bapatla College of Pharmacy Page 75


Correlation

o Mean of percent drug release 𝑦 = = 62

2. Calculate the deviations from the mean for each pair:


o x-xˉ=2-6=-4. 4-6=-2, 6-6=0, 8-6=2. 10-6=4
o y−yˉ=21-62=-41, 43-62=-19, 62-62=0, 86-62=24, 98-62=36
3. Multiply the deviations for each pair:
o (−4)x(−41)=164, (-2) x(-19)=38, (0)x0,(2)x24=48,4x36=144
4. Sum the products: 164+38+0+48+144=394
5. Square the deviations for each variable:
o For time: (−4)2=16, (-2)2 = 4, (0)2=0, (2)2=4,(4)2 = 16.
o Sum: 16+4+0+4+16=40
6. For percent drug release: (−41)2=1681, (-19)2 = 361,(0)2=0, (24)2=576, (36)2 =
1296. Sum: 1681+361+0+576+1296=3914
7. Calculate the Pearson correlation coefficient:

% drug (x-xˉ) (y−y (x-xˉ) (y−yˉ) (x-xˉ)2 (y−yˉ)2


Time(hr)
dissolved ˉ)
(x)
(y)
2 21 -4 -41 164 15 1681
4 43 -2 -19 38 4 361
6 62 0 0 0 0 0
8 86 2 24 48 4 576
10 98 4 36 144 16 1296
2
(x-xˉ) (x-xˉ) =40 (y−yˉ)2=3
x=30 y=310
(y−yˉ)=394 914

394
𝑟=
√40 ∗ √3914

394
=
6.325 ∗ 62.56

394
=
395.69

= 0.9957

Bapatla College of Pharmacy Page 76


Correlation

Pearson correlation coefficient (PCC) r=0’995signifies a flawless positive linear


relationship between time andpercentage of drug release. As time progresses, the
percentage of drug release rises proportionally,following a perfectly linear pattern.

Alternatively, this can be calculated using the following formulae

∑(𝑥𝑦) − (∑ 𝑥 ∑ 𝑦)
𝑟=
(∑ ) (∑ )
∑𝑥 − ∑𝑦 −

Time(hr) % drug dissolved xy x2 Y2


(x) (y)
2 21 42 4 441
4 43 172 16 1849
6 62 372 36 3844
8 86 688 64 7396
10 98 980 100 9604
x=30 y=310 xy=2254  x2=220  Y2=23134

2254 − (30 ∗ 310)


𝑟=
( ) ( )
220 − 23134 −

2254 − (9300)
𝑟=
220 − 23134 −

2254 − 1860
𝑟=
√220 − 180√23134 − 19220

394
𝑟=
√40√3914

Bapatla College of Pharmacy Page 77


Correlation

394
𝑟=
6.325 ∗ 62.56

394
𝑟=
395.69

𝑟 = 0.9957

4.3: Partial Correlation Estimation:

To calculate partial correlation between two variables (eg. x and y), while accounting
for the influence of a third variable z, the formula is derived as:

rXY − rXZ rYZ


𝑟𝑋𝑌. 𝑍 =
(1 − r XZ)(1 − r YZ)

Where:

 rXY represents the PCC between X and Y (the two variables of interest).
 rXZ denotes the PCCbetween X and Z (the controlledvariable).
 rYZ signifies the PCC between Y and Z (the controlled variable).
 rXY.Z is the partial correlation coefficient between X and Y,with the effect of
Z.

Example:Calculate thepartial correlation coefficient for the following data.

Concentration of binder Disintegration time Ageing of the tablet


(%w/w) (min) (months)
1 2 3
2 4 15
3 6 18
4 8 21
5 10 24

Bapatla College of Pharmacy Page 78


Correlation

Step 1: Compute the Pearson Correlation Coefficients

Begin by calculating the PCCforthe three variable pairs:

1. rXY: The Correlation between the binder concentrationanddisintegration time.


2. rXZ: The Correlation between the binder concentrationandthe ageing of the
tablet.
3. rYZ: The Correlation between disintegration timeand theaging of the tablet

These correlations can be computed using the Pearson correlation formula, and the
results are as follows.

 rXY=1(This indicates a perfect positive linear relationship betweenbinder


concentration and disintegration time.
 rXZ=0.936(This suggests a strong positive correlation between concentration
of binder and tabletaging.
 rYZ=0.936(This shows a strong positive correlation between disintegration
time and tabletaging.
Step 2: Apply the Partial Correlation Formula

Next, insert these values into the partial correlation formula to calculate the partial
correlation:

rXY − rXZ rYZ


𝑟𝑋𝑌. 𝑍 =
(1 − r XZ)(1 − r YZ)
1 − [0.936 ∗ 0.936]
𝑟𝑋𝑌. 𝑍 =
[1 − (0.936) ] ∗ [1 − (0.936) ]
1 − 0.876
=
[1 − 0.876] ∗ [1 − 0.876]
0.124
=
[0.124] ∗ [0.124]
0.124
=
0.124

=1

Bapatla College of Pharmacy Page 79


Correlation

Interpretation:

 Positive Partial Correlation: A positive partial correlation (closer to 1),


indicates that, after controlling for the influence of other variables, as increase
in X is associated with an increasein Y.
 Negative Partial Correlation: A negative partial correlation (closer to -1),
implies that, after accounting for the effects of other variables, as X increases,
Y tends to decrease.
 Zero or Near-Zero Partial Correlation: A partial correlation close to0 suggests
that there is minimal to no direct linear relationship between X and Y once the
influence of the controlled variable is removed.

In the observed case, after accountingfor the effect of tabletaging,it appears that as
binder concentrationincreases, thedisintegration time of the tablet tends to
increase.

4.4: Multiple Correlation Estimation:

The coefficient of multiple determination (R2)measures the overall strength of the


regression equation

𝑅 = 𝑟 𝑋𝑌 + 𝑟 𝑋𝑌. 𝑍(1 − 𝑟 𝑋𝑌)

Where

 R2is the coefficient of multiple determination


 rXYis the correlation between Y and X
 rXY.Zis the partial correlation between Y and X, while the effect of Z
controlled.

Example: Imaginea pharmaceutical company has the following data on patient age,
drug dose, andBP reduction. The company seeksto determine the correlation between
age and drug dosewhile accounting for the influence of BP reduction.

Bapatla College of Pharmacy Page 80


Correlation

Patient Age(years) Dose(mg) Blood pressure reduction(mmHg)


1 40 10 12
2 50 15 14
3 60 20 15
4 70 25 16
5 80 30 17

Begin by calculating the Pearson correlation coefficientsfor three pairs of variables:

4. rXY: The Correlation between age and dosage.


5. rXZ: The Correlation between age andBP reduction.
6. rYZ: The Correlation between dosage and BP reduction.

These correlations can be calculated using the Pearson correlation formula:

 rXY=1 (This reflects a perfect positive linear relationship between age and
dosage in this instance).

 rXZ=0.986 (This indicates a strong positive correlation between ageand BP


reduction,suggesting that older patients might experience a lesser reduction in
BP due to other influencing factors.)
 rYZ=0.986 (This reveals a strong positive correlation between drug dosage
andBP reduction).

The Multiple Correlation coefficient can be calculated with the following formulae.

𝑅 = 𝑟 𝑋𝑌 + [𝑟 𝑋𝑌. 𝑍(1 − 𝑟 𝑋𝑌)]

= (1) + [(1) (1 − 1) ]

= 1−0
=1

Bapatla College of Pharmacy Page 81


Correlation

The multiple correlation coefficients (R) indicatea strong correlation between the
combination of age, dose,and BP reduction.

4.5:CorrelationMeasurement

Correlation can be quantified using three different methods; viz., Scatter Diagram,
Karl Pearson’s Coefficient of Correlation, and Spearman’s Rank Correlation
Coefficient.

4.5.1. Scatter Diagram

The scatter diagram is a straightforward and visually appealing method used to evaluate
thecorrelation between two variables by graphing theirbivariate distribution.This
methodaids tofindthe nature of the relationship between two variables andprovides a
clear visual representationoffering insights regarding the nature of the association
between the two variablesto the researcher or analyst. It is one ofthe basicapproachfor
finding the relationship between two variables since it does not require any numerical
calculations.

The two essential steps for creating a Scatter Diagram or Dot Plot:
1. Plot the values of the variables (say X and Y) along the X-axis and Y-axis
respectively.
2. Place dots on the graph corresponding to each pair of values.
Example:

Use a scatter diagram to represent the following values of X and Y, and then
analyse type and degree of correlation.

Time (hr) Amount of drug release (mg)

1 9
2 17
4 35

6 56

8 74

10 98

Bapatla College of Pharmacy Page 82


Correlation

120

Amount of drug release (mg)


100

80

60

40

20

0
0 2 4 6 8 10 12
Time (hr)

The scatter diagram illustrates an upward trend in data points, moving from the lower
left-hand corner to the upper right-hand corner of the graph. This indicates a Positive
Correlation between the values of X and Y variables.

3. Spearman’s Rank Correlation Coefficient.(SRCC):

SRCC also known as Spearman’s Rank Difference Method, is a statistical technique


used to assessthe correlation coefficient of qualitative variables.Developed in 1904 by
Charles Edward Spearman,this method is particularly employed to determine the
correlation between variables such as pain, oedema, odour, taste,and other attributes
that cannot be quantified directly. As a result, these attributes or characteristics are
ranked or arranged based on their relative preference.

6∑𝐷
𝑟 = 1−
𝑁 −𝑁

In the given formula,

rk= Coefficient of rank correlation

D = Rank differences

N = Number of variables
Correlation

Example:

Determine Spearman’s Rank Correlation of Coefficient for the following data.

Time (hr) Amount drug


release (mg)
1 9
2 17
4 35
6 56
8 74
10 98
The above data is transformed to ranks as follows

Time (hr) Amount of Difference in Square of the


ranks drug release ranks difference in
(mg) ranks
ranks
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5 5 0 0
6 6 0 0
Total 0
Spearman’s Rank Correlation Coefficient
6∑𝐷
𝑟 = 1−
𝑁 −𝑁
6 (0 )
= 1−
6 −6
0
=1−
216 − 6
= 1−0
=1

Bapatla College of Pharmacy Page 84


Correlation

4.6: Applications of correlation coefficient


Correlation coefficient plays a vital role in various fields, particularly in the research,
development, and production of pharmaceutical products.

Drug Discovery and Development:

 Structure-Activity Relationship (SAR):Understanding therelationship between


the chemical structure of a molecule and its biological activity is vital in
design of new drugs. Researchers can modify the molecular structure of
compounds to enhance their efficacy or reduce toxicity based on the
relationship.
 Pharmacokinetics and Pharmacodynamics (PK/PD): Correlation between
drug concentration in the body (pharmacokinetics) and the drug's effect
(pharmacodynamics) is critical inoptimizing dosing schedules and forecasting
therapeutic outcomes.
 Biomarker Correlation: Correlations between biological markers (biomarkers)
and disease progression or drug response helps to identify potential therapeutic
targets and predict patient responses.

Clinical Trials:

 Efficacy vs. Dose Response: Clinical trials often explore the correlation
between drug dose and therapeutic response to identify the optimal dose for
maximum benefit with minimal side effects.
 Patient Outcomes: Correlating genetic or phenotypic data with clinical
outcomes is essentialin personalized medicine, where treatments are tailored to
individual patient profiles, improving success rates and reducing adverse
effects.

Quality Control and Manufacturing:

 Process Parameters and Product Quality:Understanding the correlation


between different process parameters (such as temperature, mixing time) and
the final drug product quality ensures consistency and compliance with
regulatory standards.

Bapatla College of Pharmacy Page 85


Correlation

 Stability Testing: Long-term stability testing correlates environmental


factors(such as temperature, humidity) with a drug’s shelf-life, ensuring its
safety and effectiveness over time.

Formulation Development:

 In Vitro-In Vivo Correlation (IVIVC): Correlating the in vitro dissolution rate


of a drug formulation with the in vivo bioavailability is vital for predicting the
drug's performance in the human body, aiding in the design of oral dosage
forms.

Regulatory and Compliance:

 Data Integrity and Statistical Analysis: Regulatory authorities require


statistical correlations to ensure consistency and reliability in stability and
clinical trial data, validating that the results are scientifically valid and
reproducible.

Bapatla College of Pharmacy Page 86


Correlation

References

1. Asuero AG, Sayago A, González AG. The correlation coefficient: An


overview. Critical reviews in analytical chemistry. 2006;36(1):41-59.
2. Udovičić M, Baždarić K, Bilić-Zulle L, Petrovečki M. What we need to know
when calculating the coefficient of correlation?. Biochemia Medica.
2007;17(1):10-5.
3. Gogtay NJ, Thatte UM. Principles of correlation analysis. Journal of the
Association of Physicians of India. 2017;65(3):78-81.
4. Hauke J, Kossowski T. Comparison of values of Pearson's and Spearman's
correlation coefficients on the same sets of data. Quaestiones geographicae.
2011;30(2):87-93.
5. Ali Abd Al-Hameed K. Spearman's correlation coefficient in statistical
analysis. International Journal of Nonlinear Analysis and Applications.
2022;13(1):3249-55.
6. Schober P, Boer C, Schwarte LA. Correlation coefficients: appropriate use and
interpretation. Anesthesia & analgesia. 2018:126(5):1763-8.
7. Obilor EI, Amadi EC. Test for significance of Pearson’s correlation
coefficient. International Journal of Innovative Mathematics, Statistics &
Energy Policies. 2018;6(1):11-23.
8. Akoglu H. User's guide to correlation coefficients. Turkish journal of
emergency medicine. 2018 ;18(3):91-3.
9. Schober P, Boer C, Schwarte LA. Correlation coefficients: appropriate use and
interpretation. Anesthesia & analgesia. 2018;126(5):1763-8.

Bapatla College of Pharmacy Page 87


Regression

5. Regression:

In pharmaceutical research, regression typically refers to a statistical technique used


to model and analyze the relationships between variables. It’s often employed to
understand how different factors or variables influence critical outcomes. Regression
analysis helps researchers make predictions, identify trends, and to control
confounding variables in clinical trials, drug development and analysis.

5.1: Difference between correlation and regression

Correlation and regression are both statistical methods used to examine the
relationship between two or more variables, but they serve different purposes and
provide different insights. They are summarised as follows

Aspect Correlation Regression


Purpose Assesses the strength and Predicts the value of dependent
direction of the relationship variable based on one or more
between two variables. independent variables.
Nature of the This relationship is bidirectional, The relationship is directional,
Relationship meaning the correlation between meaning variable X predicts
variables A and B is the same as variable Y, but variable Y does
the correlation between B and A not predict variable X
(symmetric). (asymmetric).
Output: Produces a mathematical
Yields a correlation coefficient
equation (e.g., Y = a + bX) that
(denoted as r), ranging from -1 to
quantifies the relationship
1, indicating the strength and
between the dependent and
direction of relationship.
independent variables.
Causality Indicates that two variables
Can explore causal relationships
change together, but doesn't
by modelling how one variable
explain the cause why one
affects the other.
variable affects the other.

Bapatla College of Pharmacy Page 88


Regression

5.2: Curve Fitting by Least Squares

Curve fitting by the least squares method is a statistical technique employed to


identify the optimal curve that best matches a set of data points. The primary
objective is to minimize the sum of the squared differences, known as residuals,
between the observed data points and the values predicted by the chosen model. This
technique is widely used in regression analysis to find the most accurate model to
represent the relationship between variables. The least squares method is a powerful
tool for model fitting. By minimizing the sum of squared residuals, it ensures that the
model’s predictions are as close as possible to the actual data points. This method
reliable for drawing conclusions about the relationships between variables or for
making predictions based on observed data. The least squares method determines the
equation of the line of best fit or regression line, which is the line that most closely
approximates the data. The line is derived by minimizing the sum of the squared
differences between the observed data points and the values predicted by the model.
This method aims to minimise the sum of squares of deviations as much as possible.
The line obtained from such a method is called a regression line or line of best fit. The
Least Square method assumes that the data is relatively well-distributed and doesn’t
contain any outliers for deriving a line of best fit.

Formula for Least Square Method

Least Square Method formula is used to find the best-fitting line through a set of data
points. For a simple linear regression, which takes the form 𝑦 = 𝑎 + 𝑏𝑥, where y is
the dependent variable, x is the independent variable, b is the slope of the line, and a is
the y-intercept, the least squares method uses a specific formulas to calculate the
values of the slope (b) and the intercept (a) based on the given data points:

(∑ ) (∑ )(∑ )
1. Slope (b) Formula: b = (∑ ) (∑ )

(∑ ) (∑ )
2. Intercept (a) Formula:a =

Bapatla College of Pharmacy Page 89


Regression

Where:

 n is the number of data points,

 ∑xy is the sum of the product of each pair of x and y values,

 ∑x is the sum of all x values,

 ∑y is the sum of all y values,

 ∑x2 is the sum of the squares of x values.

Example:

Draw the regression line y=a+bx for the following data

Time (hr) Amount of drug release (mg)


2 18
4 34
6 52
8 75
10 86

Solution:

Time (hr) Amount of drug release (mg) xy x2


x y
2 18 36 4
4 34 136 16
6 52 312 36
8 75 600 64
10 86 860 100
Total ∑x=30 ∑y=265 ∑xy=1944 ∑x2=220

(∑ ) (∑ )(∑ )
1. Slope (b) Formula: b = (∑ ) (∑ )

Bapatla College of Pharmacy Page 90


Regression

5(1944) − (30)(265)
5(220) − (30)

9720 − 7950
=
1100 − 900

1770
=
200

=8.85

(∑ ) (∑ )
2. Intercept (a) Formula:a =
265 − 8.85(30)
=
5
265 − 265.5
=
5
−0.5
=
5
= −0.1

Y=-0.1+8.85x

Alternatively the following formulae can be used.

[∑(𝑥 − 𝑥̄) ∗ (𝑦 − ȳ)]


𝑏=
∑(𝑥 − 𝑥̄)

𝑎 = ȳ − b 𝑥̄

The steps to find the line of best fit by using the least square method are as follows:

 Step 1: Denote the independent variable values as xi and the dependent ones as
yi.

 Step 2: Calculate the average values of xi and yi as X and Y.

 Step 3: Presume the equation of the line of best fit as y = a + bx, where b is the
slope of the line and a represents the intercept of the line on the Y-axis.

Bapatla College of Pharmacy Page 91


Regression

 Step 4: The slope m can be calculated from the following formula:

[∑(𝑥 − 𝑥̄) ∗ (𝑦 − ȳ)]


𝑏=
∑(𝑥 − 𝑥̄)

Time (hr) Amount of drug release (mg) (x-x) (y-y) (x-x) (y-y) (x-x)2
(x) (y)
2 18 -4 -35 140 16
4 34 -2 -19 38 4
6 52 0 -1 0 0
8 75 2 22 44 4
10 86 4 33 132 16
Σ x=30 Σ y=265
x= Σ x/n=6 y= Σ y/n=53 354 40

b = [354] / (40)

=354/40

=8.85

 Step 5: The intercept c is calculated from the following formula:

𝑎 = ȳ − b 𝑥̄

= 53 − 8.85 ∗ 6

= 53 − 53.1

= −0.1

Thus, we obtain the line of best fit as y = -0.1 +8.85x, where values of b and a are
calculated from the formulae defined above. This equation can be used to estimate the
y for a given x. in this example; we can predict the amount of drug release at a
particular time.

Bapatla College of Pharmacy Page 92


Regression

For instance, the amount of drug release at 5 th hour can be predicted by substituting
5 in place of x

The amount of drug release at 5th hour=-0.1+8.85(5) =-0.1+44.25=44.15

Example: Draw the regression line x=a+by for the following data

Time (hr) Amount of drug release (mg)


2 18
4 34
6 52
8 75
10 86

Solution; For a simple linear regression, which is a line of the form x=a+by, where x
is the dependent variable, y is the independent variable, b is the slope of the line, and
a is the x-intercept, the formulas to calculate the slope (b) and intercept (a) of the line
are derived from the following equations:

(∑ ) (∑ )(∑ )
1. Slope (b) Formula: b = (∑ ) (∑ )
(∑ ) (∑ )
2. Intercept (a) Formula:a =

Time (hr) Amount of drug release (mg) xy y2


x y
2 18 36 324
4 34 136 1156
6 52 312 2704
8 75 600 5625
10 86 860 7396
Total ∑x=30 ∑y=265 ∑xy=1944 ∑y2=17205
𝒏(∑ 𝒙𝒚) (∑ 𝒙)(∑ 𝒚)
1. Slope (b) Formula: 𝒃 =
𝒏(∑ 𝒚𝟐 ) (∑ 𝒚)𝟐

5(1944) − (30)(265)
=
5(17205) − (265)

Bapatla College of Pharmacy Page 93


Regression

9720 − 7950
=
86025 − 70225
1770
=
15800
= 0.112
(∑ ) (∑ )
2. Intercept (a) Formula:a =
. ( )
=

.
=

.
=

=0.0628

A line of the form x= a+by


x=0.063+0.112y

Alternatively the following formulae can be used.


[∑(𝑥 − 𝑥̄) ∗ (𝑦 − ȳ)]
𝑏=
∑(𝑦 − ȳ)
𝑎 = 𝑥̄ − b ȳ
Amount of drug (𝑥 − 𝑥̄) (𝑦 − ȳ) (𝑥 − 𝑥̄) (𝑦 − ȳ) (𝑦 − ȳ)
Time (hr)
release (mg)
(x)
(y)
2 18 -4 -35 140 1225
4 34 -2 -19 38 361
6 52 0 -1 0 1
8 75 2 22 44 484
10 86 4 33 132 1089
Σ x=30 Σ y=265
x= Σ 354 3160
y= Σ y/n=53
x/n=6

Bapatla College of Pharmacy Page 94


Regression

b=

=0.112

a = 6 – 0.112x53

=6-5.937

=0.063
A line of the form x=a+by

x=0.063+0.112y

This equation can be used to predict the x for a given y. in this example; we can
predict the time required to get a required amount of drug release.

For instance, the time required to get 60 mg of drug release can be predicted by
substituting 60 in place of y

The time required to get 60 mg of drug release = 0.063+0.112(60) =


0.063+6.72=6.783 hours.

5.3: Multiple regression

Multiple regression is a statistical technique used to model the relationship between a


dependent (response) variable and two or more independent (predictor) variables. The
primary objective is to explore how variations in the independent variables affect the
dependent variable.

Assumptions of Multiple Regression:

1. Linearity: The relationship between the dependent variable and the


independent variables must be linear. There exists a linear relationship
between each predictor variable and the response variable.
2. Independence: The observations are independent.

Bapatla College of Pharmacy Page 95


Regression

3. Homoscedasticity: The variance of residuals is constant across all levels of the


independent variables. The residuals have constant variance at every point in
the linear model.
4. Multivariate Normality: The residuals of the model are normally distributed.
5. No Multicollinearity: None of the predictor variables are highly correlated
with each other.

The Equation of Multiple Regression:

The general formula for multiple regression is:

Y=β0+β1X1+β2X2+⋯+βkXk+ϵY

Where:

Y = Dependent variable (what you're predicting).

β₀ = Intercept (the expected value of Y when all X's are zero).

β₁, β₂ ... βₖ = Coefficients (represent the change in Y for a one-unit change in the
corresponding X variable).

X₁, X₂ ... Xₖ = Independent variables (predictors).

ε = Error term (captures the variation in Y not explained by the predictors).

Example: Calculate the multiple regression equation for the following data.

Concentration of binder Concentration of Disintegration time (min)


(%w/w) disintegrant (%w/w)
1 2 5
2 5 7
3 4 10
4 3 12
5 6 13

Bapatla College of Pharmacy Page 96


Regression

[(∑ )(∑ ) (∑ )(∑ )]


The formula to calculate b1 is:
[(∑ )(∑ ) (∑ ) ]

[(∑ )(∑ ) (∑ )(∑ )]


The formula to calculate b2 is:
[(∑ )(∑ ) (∑ ) ]

The formula to calculate b0 is: 𝑦 − 𝑏 𝑥 − 𝑏 𝑥

(∑ 𝑋 )
𝑥1 2 = 𝑋 −
𝑛

(∑ 𝑋 )
𝑥2 2 = 𝑋 −
𝑛

(∑ 𝑋 ∑ 𝑦)
𝑥1 𝑦 = 𝑋 𝑦−
𝑛

(∑ 𝑋 ∑ 𝑦)
𝑥2 𝑦 = 𝑋 𝑦−
𝑛

(∑ 𝑋 ∑ 𝑋 )
𝑥1 𝑥2 = 𝑋𝑋 −
𝑛

Concentration of Concentration Disintegration X1 2 X22 X1y X2 y X1X2


binder (%w/w)X1 of disintegrant time (min) y
(%w/w) X2
1 2 5 1 4 5 10 2
2 5 7 4 25 14 35 10
3 4 10 9 16 30 40 12
4 3 12 16 9 48 36 12
5 6 13 25 36 65 78 30
Total ΣX1=15 ΣX2=20 Σy=47 ΣX12=55 ΣX22=90 ΣX1 ΣX2 ΣX1
y=162 y=199 X2=66
X1=ΣX1/n=15/5=3 X2= y=Σy/n=47/5=9.4
ΣX2/n=20/5=4

Bapatla College of Pharmacy Page 97


Regression

( )
∑ 𝑥1 2 = 55 − =55-45=10

(20)
𝑥2 2 = 90 − = 90 − 80 = 10
5

(15 ∗ 47)
𝑥1 𝑦 = 162 − = 162 − 141 = 21
5

(20 ∗ 47)
𝑥2 𝑦 = 199 − = 199 − 188 = 11
5

(15 ∗ 20)
𝑥1 𝑥2 = 66 − = 66 − 60 = 6
5

[( )( ) ( )( )] [ ]
b1 =
[( )( ) ( ) ]
=[ ]
= = 2.25

[( )( ) ( )( )] [ ]
b2 =
[( )( ) ( ) ]
= [ ]
= = −0.25

The formula to calculate b0 is: y – b1X1 – b2X2

=9.4-(2.25) (3)-(-0.25)(4)

=9.4-6.75+1

=10.4-6.75

=3.65

The estimated linear regression equation is: ŷ = 3.65 + 2.21x1–0.25x2

Interpret a multiple linear regression equation

Predict the disintegration tine if the binder concentration is 3% and disintegrant


concentration is 5%

Y=3.65+2.21(3)-0.25(5)
=3.65+6.63-1.25
10.28-1.25
=9.03

Bapatla College of Pharmacy Page 98


Regression

5.4. Standard error of regression

The standard error of regression (also known as the standard error of the estimate) is a
measure of the accuracy of predictions made by a regression model. It quantifies the
typical distance between the observed values and the values predicted by the
regression model. A smaller standard error indicates that the data points are closer to
the regression line. A larger standard error suggests that the data points are more
spread out from the regression line. It helps to evaluate the precision of the regression
coefficients. It is used to construct confidence intervals for the regression parameters
and to perform hypothesis testing.

The standard error of the estimate is a way to measure the accuracy of the predictions
made by a regression model.

Often denoted σest, it is calculated as:

∑(𝑦 − ȳ )
𝜎 =
𝑛

Where:
 y: The observed value
 ŷ: The predicted value
 n: The total number of observations

Example: Calculate the standard error of regression for the following data

Time (hr) Amount of drug release (mg)


2 18
4 34
6 52
8 75
10 86

The regression equation for the above data is

y = -0.1 +8.85x

Bapatla College of Pharmacy Page 99


Regression

Predicted values of y by using the above regression equation is as follows.

For x value 2 y= -0.1+8.85 (2)

=-0.1+17.7=17.6

For x value 4 y= -0.1+8.85 (4)

=-0.1+35.4=35.3

For x value 6 y= -0.1+8.85 (6)

=-0.1+53.1=53.0

For x value 8 y= -0.1+8.85 (8)

=-0.1+70.8=70.7

For x value 10 y= -0.1+8.85 (10)

=-0.1+88.5=88.4

Observed y value Observed y value (ŷ ) (y – ŷ) (y – ŷ)2


18 17.6 0.4 0.16
34 35.3 -1.3 1.69
52 53.0 -1 1
75 70.7 4.3 18.49
86 88.4 -2.4 5.76
Total 27.1

∑(𝑦 − ȳ )
𝜎 =
𝑛

27.1
= = √5.42 = 2.33
5

Bapatla College of Pharmacy Page 100


Regression

For example, suppose x is equal to 2. Using the estimated regression equation, we


would predict that y would be equal to:

ŷ = = -0.1+8.85 (2)

=-0.1+17.7=17.7

And we can obtain the 95% confidence interval for this estimate by using the
following formula:

 95% C.I. = [ŷ – 1.96*σest, ŷ + 1.96*σest]

=17.7-1.96(2.33), 17.7+1.96(2.33)

=17.7-4.56, 17.7+4.56

=13.14, 22.26

5.5: Pharmaceutical applications of Regression analysis

Dose-Response Modelling:
 Regression techniques are used to connect the relationship between drug
dosage and the corresponding biological response. A typical application
involves fitting a logistic or nonlinear regression model to predict the effect of
drug doses on patient outcomes.
Pharmacokinetics (PK) and Pharmacodynamics (PD) Analysis:
 Regression models serve to examine the relationship between drug
concentration in the bloodstream and the pharmacological effects, helping to
optimize dosing regimens strategies.
Clinical Trial Data Analysis:
 Regression is employed to analyze the effectiveness of a treatment, adjusting
for variables such as age, gender, or baseline health conditions, ensuring that
the observed effects are attributed to the drug rather than external confounding
factors.

Bapatla College of Pharmacy Page 101


Regression

Biomarker Identification and Validation:


 Regression analysis helps in identifying and validating biomarkers (biological
indicators of disease or drug response) by modeling their relationship to drug
response, disease progression, or other outcomes and useful for personalized
medicine.
Survival Analysis:
 In oncology and other therapeutic areas, regression techniques like Cox
proportional hazards models are used to analyze patient survival or
progression-free survival. This helps in determining the effectiveness of a new
drug or treatment regimen.
Adverse Event Prediction:
 Regression models are used to estimate the probability of adverse events
occurring based on patient demographics, drug dosage, existing health
conditions, and other factors to identify risk assessment and in identifying
patient populations that may need closer monitoring.
Regulatory Submission and Decision-Making:
 Pharmaceutical companies utilize regression analysis to develop models that
provide evidence supporting the safety and efficacy of a drug. These models
are integral to regulatory submissions and in making decisions about drug
approval.

Bapatla College of Pharmacy Page 102


Regression

References

1. Bingham NH, Fry JM. Regression: Linear models in statistics. Springer


Science & Business Media; 2010.

2. Shi R, Conrad SA. Correlation and regression analysis. Ann Allergy Asthma
Immunol. 2009;103(4):S34-41.

3. Ngo TH, La Puente CA. The steps to follow in a multiple regression analysis.
InProceedings of the SAS Global forum 2012 (pp. 22-25). Princeton, NJ,
USA: Citeseer.

4. Adeboye NO, Fagoyinbo IS, Olatayo TO. Estimation of the effect of


multicollinearity on the standard error for regression coefficients. Journal of
Mathematics. 2014;10(4):16-20.

5. Bolton S, Bon C. Pharmaceutical statistics: Practical and clinical applications,


revised and expanded. CRC press; 2003.

6. Gabrielsson J, Lindberg NO, Lundstedt T. Multivariate methods in


pharmaceutical applications. Journal of Chemometrics: A Journal of the
Chemometrics Society. 2002;16(3):141-60.

Bapatla College of Pharmacy Page 103


Probability

6. Probability

Probability is a branch of mathematics that deals with the likelihood or chance of


different outcomes occurring. It is a measure of the certainty or uncertainty of an
event happening and is expressed as a number between 0 and 1, where:

 0 indicates that an event will not occur.


 1 indicates that an event will certainly occur.

The general definition of probability for a random experiment or event is given by the
ratio of the number of favourable outcomes to the total number of possible outcomes,
assuming all outcomes are equally likely. Mathematically, the probability P(A) of an
event A is defined as:

Number of favorable outcomes for event A


𝑃(𝐴) =
Total number of possible

Key concepts in probability:

1. Sample Space (S): The complete set of all possible outcomes for a random
experiment.
2. Event: A subset of the sample space; which can consist of one or more
outcomes.
3. Complementary Events: An event that signifies the non-occurrence of another
event, denoted Ac, with probability P (Ac) =1−P (A).
4. Independent Events: Two events are independent if the occurrence of one does
not affect the probability of the other.
5. Conditional Probability: The probability of an event occurring, given that
another event has already taken place.

Types of Probability:

 Classical Probability: Based on equally likely outcomes.


 Empirical Probability: Based on observed data or experiments.
 Subjective Probability: Based on personal judgment or belief.

Bapatla College of Pharmacy Page 104


Probability

In the field of pharmaceutical sciences, probability plays a critical role in various


areas, from drug development and clinical trials to quality control and decision-
making processes. Probability distributions are used to model uncertainty in
outcomes (e.g., normal distribution, binomial distribution). In pharmaceutical
sciences, probability is employed in following key area.

 Hypothesis Testing: Assessing whether observed effects are statistically


significant.
 Bayesian Methods: Updating probabilities based on new data.
 Monte Carlo Simulations: Modelling complex systems by running repeated
simulations to estimate probabilities of various outcomes.
 Risk Assessment: Estimating the likelihood of adverse events or failures.
6.1. Normal distribution

The normal distribution (also known as the Gaussian distribution) is one of the most
important probability distributions in statistics and plays a crucial role in fields like
the pharmaceutical sciences, medicine, and many others. It is commonly used to
model continuous random variables that tend to cluster around a central mean value.

6.1.1. Key Characteristics of the Normal Distribution:

1. Symmetry: The normal distribution is symmetric around its mean, meaning


that the probability of values occurring on either side of the mean is equal.
2. Bell-shaped curve: The graph of the normal distribution is a bell-shaped curve,
with the peak at the mean.
3. Defined by two parameters:
o Mean (µ): The center of the distribution (also the location of the peak).
o Standard deviation (σ): The spread of the distribution; it determines the
width of the bell curve. A smaller σ leads to a narrower curve, while a
larger σ results in a wider curve.
4. 68-95-99.7 Rule:
o 68% of the data falls within ±1 standard deviation of the mean.
o 95% falls within ±2 standard deviations.
o 99.7% falls within ±3 standard deviations.

Bapatla College of Pharmacy Page 105


Probability

6.1.2.Properties of Normal Distribution

Some of the important properties of the normal distribution are listed below:

 In a normal distribution, the mean, median and mode are equal.(i.e., Mean =
Median= Mode).
 The total area under the curve should be equal to 1.
 The normally distributed curve should be symmetric at the centre.
 There should be exactly half of the values are to the right of the centre and
exactly half of the values are to the left of the centre.
 The normal distribution should be defined by the mean and standard deviation.
 The normal distribution curve must have only one peak. (i.e., Unimodal)
 The curve approaches the x-axis, but it never touches, and it extends farther
away from the mean.

Normal Distribution Formula

The general probability density function (PDF) for the normal distribution is:
1 ( )
𝑓 (𝑥|𝜇, 𝜎 ) = 𝑒
√2𝜋𝜎
Where:

 x is the variable of interest.


 μ is the mean of the distribution.
 σ is the standard deviation .
 σ2 is the variance .
 e is Euler's number, approximately equal to 2.718.

Construct the normal distribution graph.

The weight distribution data of the tablets manufactured by an industry is


furnished below. Construct the normal distribution curve.

Bapatla College of Pharmacy Page 106


Probability

Tablet weight(mg) No. of tablets


98 10
99 20
100 40
101 20
102 10
Total 100

Example: Find the probability of getting a tablet having a weight 98 mg

The data followed normal distribution and the probability can be computed with
the following formulae.
Probability = The number of possible outcomes/ total number of outcomes
=10/100=0.1
The probability of getting a tablet having a weight 99 mg
=20/100=0.2
The probability of getting a tablet having a weight 100 mg
=40/100=0.4
The probability of getting a tablet having a weight 101 mg
=20/100=0.2
The probability of getting a tablet having a weight 102 mg
=10/100=0.1
The normal probability distribution graph is constructed in between tablet weight
(mg) in x axis and probability on y axis.
Tablet weight(mg) probability
98 0.1
99 0.2
100 0.4
101 0.2
102 0.1

Bapatla College of Pharmacy Page 107


Probability

0.45
0.4
0.35
0.3

Probability
0.25
0.2
0.15
0.1
0.05
0
97 98 99 100 101 102 103
Tablet Weight (mg)

Example: Find the probability of getting a tablet having a weight less than or
equal to 100 mg

The probability can be computed with cumulative frequency.

Tablet weight(mg) No. of tablets(frequency) Cumulative frequency


98 10 10
99 20 30
100 40 70
101 20 90
102 10 100
Total 100
The frequency of getting a tablet having a weight less than or equal to 100 mg is
equal to 70
The probability of getting a tablet having a weight less than or equal to 100
mg=70/100=0.7
Example: Find the probability of getting a tablet having a weight greater than 100
mg.
The probability of getting a tablet having a weight greater than 100 mg is
computed by subtracting the probability of getting a tablet having a weight less
than or equal to 100 mg from 1.
Probability

=1-0.7=0.3

Example: Suppose a pharmaceutical company manufactures a drug, and the


concentration of the active ingredient in the tablets is normally distributed with a
mean (μ) of 100 mg and a standard deviation (σ) of 2 mg.

We want to calculate the probability that a randomly selected tablet will have a
concentration of the active ingredient between 95 mg and 105 mg.

Step 1: Standardize the values


To use the normal distribution, we first convert the raw values (95 mg and 105 mg)
into Z-scores (the standard normal form).

The Z-score formula is:

𝑋−𝜇
𝑍=
𝜎

Where:

 X is the raw score (in this case, 95 mg or 105 mg).


 μ is the mean (100 mg).
 σ is the standard deviation (2 mg).

For X=95 mg:

Z1=95−100/2 =−5/2=-2.5

For X=105 mg:

Z2=105−100/2=5/2=2.5

Step 2: Find the cumulative distribution function values for Z-scores


Using standard normal distribution tables or software, we find the cumulative
probabilities corresponding to the Z-scores (Table 6.1).

 For Z1=−2.5, the cumulative probability is approximately 0.0062.


 For Z2=2.5, the cumulative probability is approximately 0.9938.

Bapatla College of Pharmacy Page 109


Probability

Step 3: Calculate the probability between 95 mg and 105 mg


The probability that the concentration is between 95 mg and 105 mg is the difference
between the two cumulative probabilities:

P (95≤X≤105) =P (Z2) −P (Z1)

= 0.9938 - 0.0062 = 0.9876

The probability that a randomly selected tablet will have a concentration of the active
ingredient between 95 mg and 105 mg is 0.9876 or 98.76%.

Formula for Confidence Interval (CI) using the Z-distribution:


The formula for the confidence interval when the population standard deviation is known is:

σ
CI = X ± Zα/2 ∗
√n

Where:

 xˉ is the sample mean.


 Zα/2 is the Z-score corresponding to the confidence level (for a 95%
confidence level, Zα/2≈1.96).
 σ is the population standard deviation.
 n is the sample size.

The trial conducted on 100 patients, indicated that the average reduction in systolic
blood pressure for these patients is 15 mmHg and the standard deviation of the
reduction in systolic blood pressure for the population is known to be 10 mmHg.
Calculate a 95% confidence interval for the mean reduction in systolic blood pressure
based on this sample.

Identify the relevant parameters:

 Sample Mean (x̄) = 15 mmHg


 Population Standard Deviation (σ) = 10 mmHg
 Sample Size (n) = 100
 Confidence Level = 95%

Bapatla College of Pharmacy Page 110


Probability

The standard error (SE) is given by:

𝜎
𝑆𝐸 =
√𝑛

= = =1

The margin of error is given by:

𝑍𝜎
𝑀𝐸 = ∗ 𝑆𝐸 = 1.96 ∗ 1 = 1.96
2

𝐶𝐼 = 𝑥 ± 𝑀𝐸 = 15 ± 1.96

The confidence interval is:

CI=[15−1.96,15+1.96]=[13.04,16.96]

Applications of normal distribution in pharmaceutical sciences

Some key applications of normal distribution are outlined below.

1. Quality Control and Manufacturing

Batch Testing and Consistency: In the pharmaceutical industry, the quality of drug
products must be consistent. Normal distribution is commonly used to assess the
variations in parameters such as drug content, tablet weight, dissolution rates, and
other critical factors during manufacturing.

Process Control: Statistical process control (SPC) is used to monitor


manufacturing processes. The normal distribution helps identify any deviations
from expected behaviour (e.g., variations in process parameters) and triggering
corrective actions if necessary.

2. Clinical Trials

Clinical trial outcomes including blood pressure readings, blood glucose levels,
serum drug concentrations, and other continuous variables are assumed to follow
normal distribution. Statistical methods that rely on normality, such as t-tests or

Bapatla College of Pharmacy Page 111


Probability

ANOVA, are frequently employed to evaluate the efficacy of drugs and their
potential side effects.

3. Bioequivalence Testing:

In Bioequivalence studies, Normal distribution plays a role in the analysis to


compare the pharmacokinetic parameters (such as AUC, C_max, T max) observed
from a generic drug and a reference drug. These parameters are often assumed to
follow a normal distribution, and statistical methods such as confidence intervals
are applied to determine if the generic drug behaves similarly to the branded drug
in terms of bioavailability.

4. Pharmacokinetic (PK) and pharmacodynamic (PD) studies:

In pharmacokinetic (PK) and pharmacodynamic (PD) studies, variability in drug


absorption, distribution, metabolism, and elimination (ADME) processes is often
modelled using normal or log-normal distributions. Normal distribution might be
used to describe the variability in pharmacokinetic (PK) and pharmacodynamic
parameters between individuals.

5. Stability Testing:

The shelf life and stability of pharmaceutical products are crucial for ensuring
both their effectiveness and safety. Normal distribution can be used to model
degradation rates of active ingredients over time under different storage
conditions.

6. Predicting Therapeutic Effects:

By modelling the response of a drug on a population (e.g., blood glucose levels in


diabetic patients), normal distribution can be used to predict how a drug will
perform across a diverse group of patients and identify those who may require
different dosing regimens.

Bapatla College of Pharmacy Page 112


Probability

7. Population Pharmacokinetics

In population pharmacokinetics, the distribution of drug concentrations across a


group of individuals can be analyzed using normal or log-normal distributions.
These models help predict how different demographic factors (age, weight, sex)
impact drug absorption and metabolism within a population, aiding in
personalized medicine approaches.

8. Compliance with Standards:

Regulatory agencies often require statistical evidence of the consistency and


quality of drug products. Normal distribution is used to assess whether
pharmaceutical products meet predefined quality standards and to ensure that they
fall within acceptable tolerances.

6.2: Binomial distribution

Binomial distribution is a common probability distribution that models the


probability of obtaining one of two outcomes under a given number of parameters.
It summarizes the number of trials when each trial has the same chance of
attaining one specific outcome.

Binomial distribution is a common discrete distribution used in statistics, because


binomial distribution only counts two states, typically represented as 1 (for a
success) or 0 (for a failure), given a number of trials in the data. Thus binomial
distribution represents the probability for x successes in n trials, given a success
probability p for each trial.

The binomial distribution is a discrete probability distribution that describes the


number of successes in a fixed number of independent Bernoulli trials
(experiments with two possible outcomes: success or failure). For a random
variable X to follow a binomial distribution, the following conditions must be
met:

Bapatla College of Pharmacy Page 113


Probability

1. Fixed Number of Trials:

The experiment must be repeated a fixed number of times, denoted as n. The


number of trials is predetermined and does not change.

2. Binary Outcomes (Success or Failure):

Each trial must have exactly two possible outcomes, commonly referred to as
"success" and "failure". These outcomes are mutually exclusive.

3. Independence:

The trials must be independent, meaning the outcome of any trial does not affect
the outcome of any other trial.

4. Constant Probability of Success:

The probability of success, denoted by p, must remain the same for each trial. The
probability of failure is 1−p, and it also remains constant throughout all trials.

5. Random Variable Representing the Count of Successes:

The random variable X represents the number of successes in the n trials.

The binomial distribution function is calculated as:

𝑃( : , ) = 𝑛𝐶 𝑝 (1 − 𝑝)

Where:

 n is the number of trials


 x is the number of successful trials
 p is the probability of success in a single trial
n
 C x is the combination of n and x.

The mean of the binomial distribution is np, and the variance of the binomial
distribution is np (1 − p).

Bapatla College of Pharmacy Page 114


Probability

Characteristics of binomial distribution

 The number of observations n is fixed.

 Each observation is independent.

 Each observation represents one of two outcomes ("success" or "failure").

 The probability of "success" p is the same for each outcome.

Construction of binomial distribution graph

A pharmaceutical company has developed a new drug and clinical trials show that
80% of patients respond positively to the drug (i.e., the probability of success is 0.8).
In a sample of 10 patients, we want to determine the probability of having a certain
number of positive responses

In this case:

 n = 10 (number of trials or patients)


 p = 0.8 (probability of success, i.e., a positive response to the drug)
 k = 0, 1, 2, ..., 10 (number of successes)

We can plot the binomial distribution of the number of successes (positive responses)
for this scenario.

The probability of having exactly k successes in n trials is given by the binomial


probability formula:

𝑃( ) = 𝑛𝐶 𝑝 (1 − 𝑝)

Calculate the probability for each k (from 0 to 10).

For k=10

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

Bapatla College of Pharmacy Page 115


Probability

!
10𝐶 = =1 Is the binomial coefficient
!( )!

P(X=10) =1 (0.8)10 (0.2) 0 =1(0.10737) X1=0.1074

Calculate the probability for each k (from 0 to 10).

For k=9

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 10
9! (10 − 9)!

P(X=9) =10 (0.8)9 (0.2) 1 =10(0.13422) X0.2= 0.2684

For k=8

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 45
8! (10 − 8)!

P(X=8) =45 (0.8)8 (0.2) 2 =45(0.16777×0.04) = 0.30199

For k=7

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 120
7! (10 − 7)!

P(X=7) =120 (0.8)7 (0.2) 3 =120(0.2097×0.008) = 0.2013

For k=6

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

Bapatla College of Pharmacy Page 116


Probability

10!
10𝐶 = = 210
6! (10 − 6)!

P(X=6) =210 (0.8)6 (0.2) 4 =210(0.2621×0.00016) = 0.088

For k=5

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 252
5! (10 − 5)!

P(X=5) =252 (0.8)5 (0.2) 5 =252(0.3276×0.00032) = 0.026

For k=4

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 210
4! (10 − 4)!

P(X=4) =210 (0.8)4 (0.2) 6 =210(0.4096×0.000064) = 0.0055

For k=3

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 120
3! (10 − 3)!

P(X=3) =120 (0.8)3 (0.2) 7 =120(0.512×0.0000128) = 0.000786

For k=2

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 45
2! (10 − 2)!

Bapatla College of Pharmacy Page 117


Probability

P(X=2) =45 (0.8)2 (0.2) 8 =45(0.64×0.00000256) = 0.0000737

For k=1

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = = 10
1! (10 − 1)!

P(X=1) =10 (0.8)1 (0.2) 9 =10(0.8×0.000000512) = 0.00000409

For k=0

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

10!
10𝐶 = =1
0! (10 − 0)!

P(X=0) =1 (0.8)0 (0.2) 10 = 1×1×0.0000001024=0.000000102445

The observed probabilities are presented in the following table.

Number of successes probability


0 0.000000102445
1 0.00000409
2 0.0000737

3 0.000786
4 0.0055

5 0.026
6 0.088

7 0.2013
8 0.30199
9 0.2684
10 0.1074

Bapatla College of Pharmacy Page 118


Probability

Binomial distribution graph is constructed by using above data.


0.350000000000
0.300000000000
0.250000000000

Probability
0.200000000000
0.150000000000
0.100000000000
0.050000000000
0.000000000000
-0.050000000000 0 2 4 6 8 10 12
Number of Successes

X-axis: Number of successes (k, ranging from 0 to 10).


Y-axis: Probability of that number of successes.

Example: Out of 100 patients, the drug is known to have a 70% success rate.
Calculate the mean and standard deviation of the number of patients who respond
positively to the drug out of a sample of 100 patients.

Solution:

o n = 100 (the number of trials=The number of patients in the clinical


trial)
o p = 0.7 (the probability of success, i.e., the probability that a patient
will respond to the drug)
o q = 1 - p = 0.3 (the probability of failure, i.e., the probability that a
patient will not respond to the drug)

Mean of the binomial distribution:

The mean of a binomial distribution is given by the formula:

μ=n⋅p

Where:

o n is the number of trials (patients).


Probability

o p is the probability of success.

Substituting the values:

μ=100⋅0.7=70

The mean number of patients expected to respond positively to the drug is 70.

Standard deviation of the binomial distribution:

The standard deviation of a binomial distribution is given by the formula:

σ= n⋅p⋅q

Substituting the values:

1. σ=100⋅0.7⋅0.3= 21= 4.58

The standard deviation of the number of patients who will respond positively
to the drug is approximately 4.58.

Interpretation:

 Mean (μ) = 70: On average, 70 out of 100 patients are expected to respond
positively to the drug in this clinical trial.
 Standard Deviation (σ) ≈ 4.58: There is some variability in the number of
patients responding, and the typical deviation from the mean (70) is
approximately 4.58 patients.

Example1: In a clinical trial, 10 patients receive a new drug with a 90% probability of
responding positively). What is the probability that exactly 5 patients respond
positively

The binomial distribution function is calculated as:

P(x:n,p) = n C x p x (1 - p)n - x

Where:

Bapatla College of Pharmacy Page 120


Probability

 n is the number of trials=10


 x is the number of successful trials=5
 p is the probability of success in a single trial=0.9

𝑛!
𝑛𝐶 = is the binomial coefficient
𝑥!(𝑛−𝑥)!

P(X=5)= 10𝐶 (0.9) (1 − 0.9)

10! 10x9x8x7x6x5x4x3x2x1
10𝐶 = = = 252
5! (10 − 5)! 5x4x3x2x1x5x4x3x2x1

=252(0.9)5x (0.1)5 =252x0.59x0.00001 =0.00148

Example2: What is the probability of having no more than 2 defective pills out of
10 pills tested having a defect probability of 1%,

Sum probabilities for x0, 1, and 2:

P (X≤2) =P(X=0) +P(X=1) +P(X=2)

10!
10𝐶 = =1
0! (10 − 0)!

𝑃( ) = 10𝐶 0.8 (1 − 0.8)

= 1 (0.8)0 (0.2) 10 = 1 × 1 × 0.0000001024 = 0.000000102445

P(X=0) =10𝐶 (0.01)0(1 -0.01)10-0

10𝐶 =1

1x1x (0.99)10

=0.9

P(X=1) =10𝐶 (0.01)1(1 -0.01)10-1

=10 x0.01x (0.99)9

Bapatla College of Pharmacy Page 121


Probability

=0.1x0.91=0.09

P(X=2) =10𝐶 (0.01)2(1 -0.01)10-2

=45x0.0001x (0.99)8

= 0.0045x0.92

=0.004

P (X≤2) =0.9 +0.09 +0.004

=0.994

The probability of having no more than 2 defective pills out of 10 pills tested is equal
to 0.994

Example3: What is the probability that at least 2 defective pills out of 10 pills tested
P (X≥2) =1−P (X≤1)
Compute P (X≤1) by summing probabilities for x=0 to x=1.

P (X≤1) =P(X=0) +P(X=1)

=0.9+0.09=0.99

The probability that at least 2 defective pills out of 10 pills tested =1−P (X≤1)

=1-0.99=0.01

Example4: What is the probability of having no more than 2 defective pills out of
10 pills tested having a defect probability of 10%,

P (X≥2) =1−P (X≤1)

P (X≤1) value from cumulative binomial probabilities table for n=10, x=1 and
p=0.1 is 0.736(Table 6.2)

P (X≥2) =1−0.736

=0.264

Bapatla College of Pharmacy Page 122


Probability

The probability of having no more than 2 defective pills out of 10 pills

Applications in Pharmaceutical Sciences

1. Clinical Trials:
o Assessing Treatment Success: The distribution can be used to model
the number of patients who respond positively to a treatment out of a
fixed sample size.
o Adverse Events Analysis: Estimating the probability of patients
experiencing side effects.

Example: If 100 patients receive a new drug and the probability of a


successful outcome is 0.6, the binomial distribution can predict the likelihood
of specific numbers of successes.

2. Drug Manufacturing and Quality Control:


o Defective Items: Analyzing the number of defective pills or vials in a
batch.
o Process Validation: Assessing the likelihood of errors during the
manufacturing process, where success might represent adherence to
quality standards.

Example: In a batch of 100 tablets, if the probability of a defect is 0.05, the


binomial distribution helps determine the chance of having more than a
specified number of defective tablets.

3. Pharmacokinetics and Pharmacodynamics:


o Dose-Response Studies: Estimating the probability of therapeutic
response versus no response at various doses.

6.3; Poisson distribution

The Poisson distribution is frequently used to model rare events or occurrences


within a fixed interval of time or space. It is a discrete probability distribution and
particularly useful when studying events that happen independently and at a
constant rate. In pharmaceutical research, the Poisson distribution is an essential

Bapatla College of Pharmacy Page 123


Probability

tool for modelling rare events in clinical trials, epidemiological studies,


pharmacovigilance, and predicting the occurrence of infrequent events, such as
side effects or disease occurrences, over specified intervals or populations.

The Poisson distribution is a discrete probability distribution that models the number
of events occurring within a fixed interval of time or space, given certain conditions.
For a random variable X to follow a Poisson distribution, the following conditions
must be met:

1. Events Occur Independently:

The events must be independent of each other. The occurrence of one event does
not affect the probability of another event occurring.

2. Fixed Interval:

The events are counted within a fixed interval of time, space, or other dimensions.
This interval is typically denoted as a time period, area, or volume in which the
events are expected to happen.

3. Constant Mean Rate (λ):

The average number of events that occur in the fixed interval is constant. This
average rate is denoted by λ, which represents the mean rate of occurrence of
events. It is assumed to be the same throughout the interval.

4. Events Occur One at a Time:

The events cannot occur simultaneously. That is, no more than one event can
happen at an exact point in time or location.

5. Events Are Rare Relative to the Interval:

The events are considered rare within the given interval.

The mean (μ) of a Poisson distribution is equal to its rate parameter λ:

Bapatla College of Pharmacy Page 124


Probability

 μ=λ

The standard deviation (σ) of a Poisson distribution is the square root of its mean:

 𝜎 = √λ

Both the mean and the standard deviation of a Poisson distribution are determined
by the rate λ. The standard deviation is equal to the square roots of the mean in
Poisson distribution. Both the mean and the variability (SD) of the Poisson
distribution increase with the average rate of events increases.

Example: Calculate the mean and standard deviation for the number of severe
adverse events in a sample of 100 patients. Based on preclinical studies or
historical data, it is known that, on average, 2 patients out of every 100 experience
a severe adverse event within a certain time frame.

Solution: From the problem statement, the average number of severe adverse
events per 100 patients is 2. Therefore, for a sample of 100 patients, we set λ=2.

Calculate the Mean:

The mean of the Poisson distribution is equal to λ. So, the mean number of severe
adverse events in the sample of 100 patients is:

μ=λ=2

This means that, on average, 2 patients in the trial will experience a severe adverse
event.

Calculate the Standard Deviation:

The standard deviation of a Poisson distribution is given by the square root of the
rate parameter λ:

 𝜎 = √λ

= √2= 1.414

Bapatla College of Pharmacy Page 125


Probability

Example: is conducting a clinical trial for a new drug. Based on previous studies
and reports, a pharmaceutical company know that on average, 3 patients per 1000
experience a particular severe adverse event while using the drug.

(a): Estimate the number of expected severe adverse events in a sample of 5000
patients.

What is the probability of observing a specific number of severe adverse events in


a sample of 5000 patients during the trial?

The rate parameter λ is typically the expected number of events.

From previous data, the company knows that 3 patients out of 1000 experience
severe adverse events. So, for a sample of 5000 patients, the expected number of
severe adverse events can be calculated as:

3
λ= ∗ 5000 = 15
1000

Thus, for 5000 patients, the rate parameter λ=15. This means, on average, we expect
15 severe adverse events in this sample of 5000 patients.

(b) Calculate the Probability of observing exactly 10 severe adverse events

The Poisson probability mass function (PMF) is used to calculate the probability of
observing exactly 10 severe adverse events. The Poisson PMF is given by the
formula:

𝜆 𝑒
𝑃(𝑘) =
𝑘!

Where:

 P(k) is the probability of observing exactly k events (in this case, k=10),
 λ is the expected number of events (in this case, λ=15),
 e is Euler's number (approximately 2.718),
 k! is the factorial of k.

Bapatla College of Pharmacy Page 126


Probability

Substitute the values into the formula:

15 𝑒
𝑃(10) =
10!

1510 =576650390625

e−15=3.059×10−7

10!=10×9×8×7×6×5×4×3×2×1=3,628,800

Now, substitute these into the formula:

576650390625 ∗ 3.059 × 10 − 7
𝑃(10) =
3,628,800

=0.099

The probability of observing exactly 10 severe adverse events in the sample of 5000
patients is approximately 0.099 or 9.9%.

Calculate the Probability of observing exactly 9 severe adverse events

15 𝑒
𝑃(9) =
9!

159 =3844335937

𝑒 =3.059×10−7

9!=9×8×7×6×5×4×3×2×1=3,628,80

Now, substitute these into the formula:

3844335937 ∗ 3.059 × 10 − 7
𝑃 (9) =
3,628,80

=0.032

Bapatla College of Pharmacy Page 127


Probability

Calculate the Probability of observing exactly 8 severe adverse events

15 𝑒
𝑃(8) =
8!

158 =2562890625

𝑒 =3.059×10−7

8!= 8×7×6×5×4×3×2×1=40320

Now, substitute these into the formula:

2562890625 ∗ 3.059 × 10 − 7
𝑃(8) =
40320

=0.019

Calculate the Probability of observing exactly 7 severe adverse events

15 𝑒
𝑃(7) =
7!

157 =170859375

𝑒 =3.059×10−7

7!= 7×6×5×4×3×2×1=5040

Now, substitute these into the formula:

170859375 ∗ 3.059 × 10 − 7
𝑃 (7 ) =
5040

=0.01

Calculate the probability of observing exactly 6 severe adverse events

Bapatla College of Pharmacy Page 128


Probability

15 𝑒
𝑃(6) =
6!

156 =11390625

𝑒 =3.059×10−7

6!= 6×5×4×3×2×1=720

Now, substitute these into the formula:

11390625 ∗ 3.059 × 10 − 7
𝑃 (6) =
720

=0.00484

Calculate the Probability of observing exactly 5 severe adverse events

15 𝑒
𝑃(5) =
5!

155 =759375

𝑒 =3.059×10−7

5!= 5×4×3×2×1=120

Now, substitute these into the formula:

759375 ∗ 3.059 × 10 − 7
𝑃 (5 ) =
120

=0.001935

Calculate the Probability of observing exactly 4 severe adverse events

15 𝑒
𝑃(4) =
4!

154 =50625

Bapatla College of Pharmacy Page 129


Probability

𝑒 =3.059×10−7

4!= 4×3×2×1=24

Now, substitute these into the formula:

50625 ∗ 3.059 × 10 − 7
𝑃(4) =
24

=0.000645

Calculate the Probability of observing exactly 3 severe adverse events

15 𝑒
𝑃(3) =
3!

153 =3375

𝑒 =3.059×10−7

3!= 3×2×1=6

Now, substitute these into the formula:

3375 ∗ 3.059 × 10 − 7
𝑃(3) =
6

=0.000172

Calculate the Probability of observing exactly 2 severe adverse events

15 𝑒
𝑃(2) =
2!

152 225

𝑒 =3.059×10−7

2!= 2×1=2

Bapatla College of Pharmacy Page 130


Probability

Now, substitute these into the formula:

225 ∗ 3.059 × 10 − 7
𝑃 (2 ) =
2

=0.0000344

Calculate the Probability of observing exactly 1 severe adverse event

15 𝑒
𝑃(1) =
1!

151 =15

𝑒 =3.059×10−7

1!= 1

Now, substitute these into the formula:

15 ∗ 3.059 × 10 − 7
𝑃(1) =
1

=0.0000046

Calculate the Probability of observing exactly 0 severe adverse events

15 𝑒
𝑃(0) =
0!

150=1

𝑒 =3.059×10−7

0!= 1

Now, substitute these into the formula:

1 ∗ 3.059 × 10 − 7
𝑃 (0) =
1

Bapatla College of Pharmacy Page 131


Probability

=0.0000003

(c) Calculate the probability of observing more than 10 severe adverse events

Calculate the probability of observing more than 10 adverse events; we need to find
the cumulative probability for all values greater than 10. This is equivalent to:

P(X>10) =1−P (X≤10)

Where:

P (X≤10) is the cumulative probability of observing 10 or fewer severe adverse


events is observed from the table of the Poisson cumulative distribution for λ=15,
x=10 is 0.1185(Table 6.3)

P(X>10) =1-0.1185= 0.881

Thus, the probability of observing more than 10 severe adverse events is


approximately 88.1%.

Construction of Poisson distribution probability graph

If the average number of adverse reactions arriving at a hospital per day is 3 (i.e.,
λ=3), construct the Poisson distribution curve for the probability of, say, 0, 1, 2,
3, etc., adverse reactions arriving at a hospital on next day .

For x=0

3 𝑒
𝑃 (0) = = 0.05
0!

For x=1

3 𝑒
𝑃(1) = = 0.15
1!

For x=2

Bapatla College of Pharmacy Page 132


Probability

3 𝑒
𝑃 (2) = = 0.225
2!

For x=3

3 𝑒
𝑃 (3) = = 0.225
3!

For x=4

3 𝑒
𝑃 (4) = = 0.169
4!

For x=5

3 𝑒
𝑃 (5) = = 0.101
5!

The observed probabilities are presented in the following table.

Number of adverse reactions probability


0 0.05
1 0.15
2 0.225
3 0.225
4 0.169
5 0.101

Bapatla College of Pharmacy Page 133


Probability

Poisson distribution graph is constructed for the above data.

0.25

0.2
Probability

0.15

0.1

0.05

0
0 1 2 3 4 5 6
Number of Adverse Reactions

Applications of Poisson distribution in pharmaceutical sciences

The Poisson distribution is applied in following fields.

1. Pharmacovigilance: In clinical trials and post-marketing surveillance, adverse


events or side effects of drugs might occur infrequently. The Poisson
distribution can model the number of times a specific adverse event occurs
within a given time period or in a group of patients.

2. Counting events in clinical trials: In a clinical trial, the Poisson distribution


can be used to model the number of occurrences of specific events, such as
disease relapse or the number of patients experiencing a particular outcome
over a fixed time period.

3. Epidemiological studies: The Poisson distribution is often used in


epidemiological studies to estimate the incidence rate of diseases or conditions
in a population. If the incidence of a particular disease is rare, the distribution
can estimate how often it occurs over a specific time period, helping to guide
public health decisions or assess drug efficacy.

4. Risk Assessment: In the context of pharmaceutical risk assessment, the


Poisson distribution helps in determining the risk of certain events occurring
based on historical data.
Probability

Table 6.1: Standard Normal Cumulative Probability Table

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148

Bapatla College of Pharmacy Page 135


Probability

-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
Table 6.2: Cumulative Binomial probabilities

𝒄
𝒏 𝒙
𝑷[𝑿 ≤ 𝒄] = 𝒑 (𝟏 − 𝒑)𝒏 𝒙
𝒙
𝒙 𝒐

c 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95
n=1 0 0.950 0.900 0.800 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.050
1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n=2 0 0.903 0.810 0.640 0.490 0.360 0.250 0.160 0.090 0.040 0.010 0.003
1 0.998 0.990 0.960 0.910 0.840 0.750 0.640 0.510 0.360 0.190 0.098
2 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n=3 0 0.857 0.729 0.512 0.343 0.216 0.125 0.064 0.027 0.008 0.001 0.000
1 0.993 0.972 0.896 0.784 0.648 0.500 0.352 0.216 0.104 0.028 0.007
2 1.000 0.999 0.992 0.973 0.936 0.875 0.784 0.657 0.488 0.271 0.143
3 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n=4 0 0.815 0.656 0.410 0.240 0.130 0.063 0.026 0.008 0.002 0.000 0.000
1 0.986 0.948 0.819 0.652 0.475 0.313 0.179 0.084 0.027 0.004 0.000
2 1.000 0.996 0.973 0.916 0.821 0.688 0.525 0.348 0.181 0.052 0.014
3 1.000 1.000 0.998 0.992 0.974 0.938 0.870 0.760 0.590 0.344 0.185
4 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Bapatla College of Pharmacy Page 136


Probability

n=5 0 0.774 0.590 0.328 0.168 0.078 0.031 0.010 0.002 0.000 0.000 0.000
1 0.977 0.919 0.737 0.528 0.337 0.188 0.087 0.031 0.007 0.000 0.000
2 0.999 0.991 0.942 0.837 0.683 0.500 0.317 0.163 0.058 0.009 0.001
3 1.000 1.000 0.993 0.969 0.913 0.813 0.663 0.472 0.263 0.081 0.023
4 1.000 1.000 1.000 0.998 0.990 0.969 0.922 0.832 0.672 0.410 0.226
5 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n=6 0 0.735 0.531 0.262 0.118 0.047 0.016 0.004 0.001 0.000 0.000 0.000
1 0.967 0.886 0.655 0.420 0.233 0.109 0.041 0.011 0.002 0.000 0.000
2 0.998 0.984 0.901 0.744 0.544 0.344 0.179 0.070 0.017 0.001 0.000
3 1.000 0.999 0.983 0.930 0.821 0.656 0.456 0.256 0.099 0.016 0.002
4 1.000 1.000 0.998 0.989 0.959 0.891 0.767 0.580 0.345 0.114 0.033
5 1.000 1.000 1.000 0.999 0.996 0.984 0.953 0.882 0.738 0.469 0.265
6 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n=7 0 0.698 0.478 0.210 0.082 0.028 0.008 0.002 0.000 0.000 0.000 0.000
1 0.956 0.850 0.577 0.329 0.159 0.063 0.019 0.004 0.000 0.000 0.000
2 0.996 0.974 0.852 0.647 0.420 0.227 0.096 0.029 0.005 0.000 0.000
3 1.000 0.997 0.967 0.874 0.710 0.500 0.290 0.126 0.033 0.003 0.000
4 1.000 1.000 0.995 0.971 0.904 0.773 0.580 0.353 0.148 0.026 0.004
5 1.000 1.000 1.000 0.996 0.981 0.938 0.841 0.671 0.423 0.150 0.044
6 1.000 1.000 1.000 1.000 0.998 0.992 0.972 0.918 0.790 0.522 0.302
7 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n=8 0 0.663 0.430 0.168 0.058 0.017 0.004 0.001 0.000 0.000 0.000 0.000
1 0.943 0.813 0.503 0.255 0.106 0.035 0.009 0.001 0.000 0.000 0.000
2 0.994 0.962 0.797 0.552 0.315 0.145 0.050 0.011 0.001 0.000 0.000
3 1.000 0.995 0.944 0.806 0.594 0.363 0.174 0.058 0.010 0.000 0.000
4 1.000 1.000 0.990 0.942 0.826 0.637 0.406 0.194 0.056 0.005 0.000
5 1.000 1.000 0.999 0.989 0.950 0.855 0.685 0.448 0.203 0.038 0.006
6 1.000 1.000 1.000 0.999 0.991 0.965 0.894 0.745 0.497 0.187 0.057
7 1.000 1.000 1.000 1.000 0.999 0.996 0.983 0.942 0.832 0.570 0.337
8 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n=9 0 0.630 0.387 0.134 0.040 0.010 0.002 0.000 0.000 0.000 0.000 0.000
1 0.929 0.775 0.436 0.196 0.071 0.020 0.004 0.000 0.000 0.000 0.000

Bapatla College of Pharmacy Page 137


Probability

2 0.992 0.947 0.738 0.463 0.232 0.090 0.025 0.004 0.000 0.000 0.000
3 0.999 0.992 0.914 0.730 0.483 0.254 0.099 0.025 0.003 0.000 0.000
4 1.000 0.999 0.980 0.901 0.733 0.500 0.267 0.099 0.020 0.001 0.000
5 1.000 1.000 0.997 0.975 0.901 0.746 0.517 0.270 0.086 0.008 0.001
6 1.000 1.000 1.000 0.996 0.975 0.910 0.768 0.537 0.262 0.053 0.008
7 1.000 1.000 1.000 1.000 0.996 0.980 0.929 0.804 0.564 0.225 0.071
8 1.000 1.000 1.000 1.000 1.000 0.998 0.990 0.960 0.866 0.613 0.370
9 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 10 0 0.599 0.349 0.107 0.028 0.006 0.001 0.000 0.000 0.000 0.000 0.000
1 0.914 0.736 0.376 0.149 0.046 0.011 0.002 0.000 0.000 0.000 0.000
2 0.988 0.930 0.678 0.383 0.167 0.055 0.012 0.002 0.000 0.000 0.000
3 0.999 0.987 0.879 0.650 0.382 0.172 0.055 0.011 0.001 0.000 0.000
4 1.000 0.998 0.967 0.850 0.633 0.377 0.166 0.047 0.006 0.000 0.000
5 1.000 1.000 0.994 0.953 0.834 0.623 0.367 0.150 0.033 0.002 0.000
6 1.000 1.000 0.999 0.989 0.945 0.828 0.618 0.350 0.121 0.013 0.001
7 1.000 1.000 1.000 0.998 0.988 0.945 0.833 0.617 0.322 0.070 0.012
8 1.000 1.000 1.000 1.000 0.998 0.989 0.954 0.851 0.624 0.264 0.086
9 1.000 1.000 1.000 1.000 1.000 0.999 0.994 0.972 0.893 0.651 0.401
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 11 0 0.569 0.314 0.086 0.020 0.004 0.000 0.000 0.000 0.000 0.000 0.000
1 0.898 0.697 0.322 0.113 0.030 0.006 0.001 0.000 0.000 0.000 0.000
2 0.985 0.910 0.617 0.313 0.119 0.033 0.006 0.001 0.000 0.000 0.000
3 0.998 0.981 0.839 0.570 0.296 0.113 0.029 0.004 0.000 0.000 0.000
4 1.000 0.997 0.950 0.790 0.533 0.274 0.099 0.022 0.002 0.000 0.000
5 1.000 1.000 0.988 0.922 0.753 0.500 0.247 0.078 0.012 0.000 0.000
6 1.000 1.000 0.998 0.978 0.901 0.726 0.467 0.210 0.050 0.003 0.000
7 1.000 1.000 1.000 0.996 0.971 0.887 0.704 0.430 0.161 0.019 0.002
8 1.000 1.000 1.000 0.999 0.994 0.967 0.881 0.687 0.383 0.090 0.015
9 1.000 1.000 1.000 1.000 0.999 0.994 0.970 0.887 0.678 0.303 0.102
10 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.980 0.914 0.686 0.431
11 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 12 0 0.540 0.282 0.069 0.014 0.002 0.000 0.000 0.000 0.000 0.000 0.000

Bapatla College of Pharmacy Page 138


Probability

1 0.882 0.659 0.275 0.085 0.020 0.003 0.000 0.000 0.000 0.000 0.000
2 0.980 0.889 0.558 0.253 0.083 0.019 0.003 0.000 0.000 0.000 0.000
3 0.998 0.974 0.795 0.493 0.225 0.073 0.015 0.002 0.000 0.000 0.000
4 1.000 0.996 0.927 0.724 0.438 0.194 0.057 0.009 0.001 0.000 0.000
5 1.000 0.999 0.981 0.882 0.665 0.387 0.158 0.039 0.004 0.000 0.000
6 1.000 1.000 0.996 0.961 0.842 0.613 0.335 0.118 0.019 0.001 0.000
7 1.000 1.000 0.999 0.991 0.943 0.806 0.562 0.276 0.073 0.004 0.000
8 1.000 1.000 1.000 0.998 0.985 0.927 0.775 0.507 0.205 0.026 0.002
9 1.000 1.000 1.000 1.000 0.997 0.981 0.917 0.747 0.442 0.111 0.020
10 1.000 1.000 1.000 1.000 1.000 0.997 0.980 0.915 0.725 0.341 0.118
11 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.986 0.931 0.718 0.460
12 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 13 0 0.513 0.254 0.055 0.010 0.001 0.000 0.000 0.000 0.000 0.000 0.000
1 0.865 0.621 0.234 0.064 0.013 0.002 0.000 0.000 0.000 0.000 0.000
2 0.975 0.866 0.502 0.202 0.058 0.011 0.001 0.000 0.000 0.000 0.000
3 0.997 0.966 0.747 0.421 0.169 0.046 0.008 0.001 0.000 0.000 0.000
4 1.000 0.994 0.901 0.654 0.353 0.133 0.032 0.004 0.000 0.000 0.000
5 1.000 0.999 0.970 0.835 0.574 0.291 0.098 0.018 0.001 0.000 0.000
6 1.000 1.000 0.993 0.938 0.771 0.500 0.229 0.062 0.007 0.000 0.000
7 1.000 1.000 0.999 0.982 0.902 0.709 0.426 0.165 0.030 0.001 0.000
8 1.000 1.000 1.000 0.996 0.968 0.867 0.647 0.346 0.099 0.006 0.000
9 1.000 1.000 1.000 0.999 0.992 0.954 0.831 0.579 0.253 0.034 0.003
10 1.000 1.000 1.000 1.000 0.999 0.989 0.942 0.798 0.498 0.134 0.025
11 1.000 1.000 1.000 1.000 1.000 0.998 0.987 0.936 0.766 0.379 0.135
12 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.990 0.945 0.746 0.487
13 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 14 0 0.488 0.229 0.044 0.007 0.001 0.000 0.000 0.000 0.000 0.000 0.000
1 0.847 0.585 0.198 0.047 0.008 0.001 0.000 0.000 0.000 0.000 0.000
2 0.970 0.842 0.448 0.161 0.040 0.006 0.001 0.000 0.000 0.000 0.000
3 0.996 0.956 0.698 0.355 0.124 0.029 0.004 0.000 0.000 0.000 0.000
4 1.000 0.991 0.870 0.584 0.279 0.090 0.018 0.002 0.000 0.000 0.000
5 1.000 0.999 0.956 0.781 0.486 0.212 0.058 0.008 0.000 0.000 0.000

Bapatla College of Pharmacy Page 139


Probability

6 1.000 1.000 0.988 0.907 0.692 0.395 0.150 0.031 0.002 0.000 0.000
7 1.000 1.000 0.998 0.969 0.850 0.605 0.308 0.093 0.012 0.000 0.000
8 1.000 1.000 1.000 0.992 0.942 0.788 0.514 0.219 0.044 0.001 0.000
9 1.000 1.000 1.000 0.998 0.982 0.910 0.721 0.416 0.130 0.009 0.000
10 1.000 1.000 1.000 1.000 0.996 0.971 0.876 0.645 0.302 0.044 0.004
11 1.000 1.000 1.000 1.000 0.999 0.994 0.960 0.839 0.552 0.158 0.030
12 1.000 1.000 1.000 1.000 1.000 0.999 0.992 0.953 0.802 0.415 0.153
13 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.993 0.956 0.771 0.512
14 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 15 0 0.463 0.206 0.035 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.829 0.549 0.167 0.035 0.005 0.000 0.000 0.000 0.000 0.000 0.000
2 0.964 0.816 0.398 0.127 0.027 0.004 0.000 0.000 0.000 0.000 0.000
3 0.995 0.944 0.648 0.297 0.091 0.018 0.002 0.000 0.000 0.000 0.000
4 0.999 0.987 0.836 0.515 0.217 0.059 0.009 0.001 0.000 0.000 0.000
5 1.000 0.998 0.939 0.722 0.403 0.151 0.034 0.004 0.000 0.000 0.000
6 1.000 1.000 0.982 0.869 0.610 0.304 0.095 0.015 0.001 0.000 0.000
7 1.000 1.000 0.996 0.950 0.787 0.500 0.213 0.050 0.004 0.000 0.000
8 1.000 1.000 0.999 0.985 0.905 0.696 0.390 0.131 0.018 0.000 0.000
9 1.000 1.000 1.000 0.996 0.966 0.849 0.597 0.278 0.061 0.002 0.000
10 1.000 1.000 1.000 0.999 0.991 0.941 0.783 0.485 0.164 0.013 0.001
11 1.000 1.000 1.000 1.000 0.998 0.982 0.909 0.703 0.352 0.056 0.005
12 1.000 1.000 1.000 1.000 1.000 0.996 0.973 0.873 0.602 0.184 0.036
13 1.000 1.000 1.000 1.000 1.000 1.000 0.995 0.965 0.833 0.451 0.171
14 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.995 0.965 0.794 0.537
15 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 16 0 0.440 0.185 0.028 0.003 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.811 0.515 0.141 0.026 0.003 0.000 0.000 0.000 0.000 0.000 0.000
2 0.957 0.789 0.352 0.099 0.018 0.002 0.000 0.000 0.000 0.000 0.000
3 0.993 0.932 0.598 0.246 0.065 0.011 0.001 0.000 0.000 0.000 0.000
4 0.999 0.983 0.798 0.450 0.167 0.038 0.005 0.000 0.000 0.000 0.000
5 1.000 0.997 0.918 0.660 0.329 0.105 0.019 0.002 0.000 0.000 0.000
6 1.000 0.999 0.973 0.825 0.527 0.227 0.058 0.007 0.000 0.000 0.000

Bapatla College of Pharmacy Page 140


Probability

7 1.000 1.000 0.993 0.926 0.716 0.402 0.142 0.026 0.001 0.000 0.000
8 1.000 1.000 0.999 0.974 0.858 0.598 0.284 0.074 0.007 0.000 0.000
9 1.000 1.000 1.000 0.993 0.942 0.773 0.473 0.175 0.027 0.001 0.000
10 1.000 1.000 1.000 0.998 0.981 0.895 0.671 0.340 0.082 0.003 0.000
11 1.000 1.000 1.000 1.000 0.995 0.962 0.833 0.550 0.202 0.017 0.001
12 1.000 1.000 1.000 1.000 0.999 0.989 0.935 0.754 0.402 0.068 0.007
13 1.000 1.000 1.000 1.000 1.000 0.998 0.982 0.901 0.648 0.211 0.043
14 1.000 1.000 1.000 1.000 1.000 1.000 0.997 0.974 0.859 0.485 0.189
15 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.997 0.972 0.815 0.560
16 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 17 0 0.418 0.167 0.023 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.792 0.482 0.118 0.019 0.002 0.000 0.000 0.000 0.000 0.000 0.000
2 0.950 0.762 0.310 0.077 0.012 0.001 0.000 0.000 0.000 0.000 0.000
3 0.991 0.917 0.549 0.202 0.046 0.006 0.000 0.000 0.000 0.000 0.000
4 0.999 0.978 0.758 0.389 0.126 0.025 0.003 0.000 0.000 0.000 0.000
5 1.000 0.995 0.894 0.597 0.264 0.072 0.011 0.001 0.000 0.000 0.000
6 1.000 0.999 0.962 0.775 0.448 0.166 0.035 0.003 0.000 0.000 0.000
7 1.000 1.000 0.989 0.895 0.641 0.315 0.092 0.013 0.000 0.000 0.000
8 1.000 1.000 0.997 0.960 0.801 0.500 0.199 0.040 0.003 0.000 0.000
9 1.000 1.000 1.000 0.987 0.908 0.685 0.359 0.105 0.011 0.000 0.000
10 1.000 1.000 1.000 0.997 0.965 0.834 0.552 0.225 0.038 0.001 0.000
11 1.000 1.000 1.000 0.999 0.989 0.928 0.736 0.403 0.106 0.005 0.000
12 1.000 1.000 1.000 1.000 0.997 0.975 0.874 0.611 0.242 0.022 0.001
13 1.000 1.000 1.000 1.000 1.000 0.994 0.954 0.798 0.451 0.083 0.009
14 1.000 1.000 1.000 1.000 1.000 0.999 0.988 0.923 0.690 0.238 0.050
15 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.981 0.882 0.518 0.208
16 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.977 0.833 0.582
17 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 18 0 0.397 0.150 0.018 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.774 0.450 0.099 0.014 0.001 0.000 0.000 0.000 0.000 0.000 0.000
2 0.942 0.734 0.271 0.060 0.008 0.001 0.000 0.000 0.000 0.000 0.000
3 0.989 0.902 0.501 0.165 0.033 0.004 0.000 0.000 0.000 0.000 0.000

Bapatla College of Pharmacy Page 141


Probability

4 0.998 0.972 0.716 0.333 0.094 0.015 0.001 0.000 0.000 0.000 0.000
5 1.000 0.994 0.867 0.534 0.209 0.048 0.006 0.000 0.000 0.000 0.000
6 1.000 0.999 0.949 0.722 0.374 0.119 0.020 0.001 0.000 0.000 0.000
7 1.000 1.000 0.984 0.859 0.563 0.240 0.058 0.006 0.000 0.000 0.000
8 1.000 1.000 0.996 0.940 0.737 0.407 0.135 0.021 0.001 0.000 0.000
9 1.000 1.000 0.999 0.979 0.865 0.593 0.263 0.060 0.004 0.000 0.000
10 1.000 1.000 1.000 0.994 0.942 0.760 0.437 0.141 0.016 0.000 0.000
11 1.000 1.000 1.000 0.999 0.980 0.881 0.626 0.278 0.051 0.001 0.000
12 1.000 1.000 1.000 1.000 0.994 0.952 0.791 0.466 0.133 0.006 0.000
13 1.000 1.000 1.000 1.000 0.999 0.985 0.906 0.667 0.284 0.028 0.002
14 1.000 1.000 1.000 1.000 1.000 0.996 0.967 0.835 0.499 0.098 0.011
15 1.000 1.000 1.000 1.000 1.000 0.999 0.992 0.940 0.729 0.266 0.058
16 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.986 0.901 0.550 0.226
17 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.982 0.850 0.603
18 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 19 0 0.377 0.135 0.014 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.755 0.420 0.083 0.010 0.001 0.000 0.000 0.000 0.000 0.000 0.000
2 0.933 0.705 0.237 0.046 0.005 0.000 0.000 0.000 0.000 0.000 0.000
3 0.987 0.885 0.455 0.133 0.023 0.002 0.000 0.000 0.000 0.000 0.000
4 0.998 0.965 0.673 0.282 0.070 0.010 0.001 0.000 0.000 0.000 0.000
5 1.000 0.991 0.837 0.474 0.163 0.032 0.003 0.000 0.000 0.000 0.000
6 1.000 0.998 0.932 0.666 0.308 0.084 0.012 0.001 0.000 0.000 0.000
7 1.000 1.000 0.977 0.818 0.488 0.180 0.035 0.003 0.000 0.000 0.000
8 1.000 1.000 0.993 0.916 0.667 0.324 0.088 0.011 0.000 0.000 0.000
9 1.000 1.000 0.998 0.967 0.814 0.500 0.186 0.033 0.002 0.000 0.000
10 1.000 1.000 1.000 0.989 0.912 0.676 0.333 0.084 0.007 0.000 0.000
11 1.000 1.000 1.000 0.997 0.965 0.820 0.512 0.182 0.023 0.000 0.000
12 1.000 1.000 1.000 0.999 0.988 0.916 0.692 0.334 0.068 0.002 0.000
13 1.000 1.000 1.000 1.000 0.997 0.968 0.837 0.526 0.163 0.009 0.000
14 1.000 1.000 1.000 1.000 0.999 0.990 0.930 0.718 0.327 0.035 0.002
15 1.000 1.000 1.000 1.000 1.000 0.998 0.977 0.867 0.545 0.115 0.013
16 1.000 1.000 1.000 1.000 1.000 1.000 0.995 0.954 0.763 0.295 0.067

Bapatla College of Pharmacy Page 142


Probability

17 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.990 0.917 0.580 0.245
18 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.986 0.865 0.623
19 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 20 0 0.358 0.122 0.012 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.736 0.392 0.069 0.008 0.001 0.000 0.000 0.000 0.000 0.000 0.000
2 0.925 0.677 0.206 0.035 0.004 0.000 0.000 0.000 0.000 0.000 0.000
3 0.984 0.867 0.411 0.107 0.016 0.001 0.000 0.000 0.000 0.000 0.000
4 0.997 0.957 0.630 0.238 0.051 0.006 0.000 0.000 0.000 0.000 0.000
5 1.000 0.989 0.804 0.416 0.126 0.021 0.002 0.000 0.000 0.000 0.000
6 1.000 0.998 0.913 0.608 0.250 0.058 0.006 0.000 0.000 0.000 0.000
7 1.000 1.000 0.968 0.772 0.416 0.132 0.021 0.001 0.000 0.000 0.000
8 1.000 1.000 0.990 0.887 0.596 0.252 0.057 0.005 0.000 0.000 0.000
9 1.000 1.000 0.997 0.952 0.755 0.412 0.128 0.017 0.001 0.000 0.000
10 1.000 1.000 0.999 0.983 0.872 0.588 0.245 0.048 0.003 0.000 0.000
11 1.000 1.000 1.000 0.995 0.943 0.748 0.404 0.113 0.010 0.000 0.000
12 1.000 1.000 1.000 0.999 0.979 0.868 0.584 0.228 0.032 0.000 0.000
13 1.000 1.000 1.000 1.000 0.994 0.942 0.750 0.392 0.087 0.002 0.000
14 1.000 1.000 1.000 1.000 0.998 0.979 0.874 0.584 0.196 0.011 0.000
15 1.000 1.000 1.000 1.000 1.000 0.994 0.949 0.762 0.370 0.043 0.003
16 1.000 1.000 1.000 1.000 1.000 0.999 0.984 0.893 0.589 0.133 0.016
17 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.965 0.794 0.323 0.075
18 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.992 0.931 0.608 0.264
19 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.988 0.878 0.642
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
n = 25 0 0.277 0.072 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.642 0.271 0.027 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000
2 0.873 0.537 0.098 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000
3 0.966 0.764 0.234 0.033 0.002 0.000 0.000 0.000 0.000 0.000 0.000
4 0.993 0.902 0.421 0.090 0.009 0.000 0.000 0.000 0.000 0.000 0.000
5 0.999 0.967 0.617 0.193 0.029 0.002 0.000 0.000 0.000 0.000 0.000
6 1.000 0.991 0.780 0.341 0.074 0.007 0.000 0.000 0.000 0.000 0.000
7 1.000 0.998 0.891 0.512 0.154 0.022 0.001 0.000 0.000 0.000 0.000

Bapatla College of Pharmacy Page 143


Probability

8 1.000 1.000 0.953 0.677 0.274 0.054 0.004 0.000 0.000 0.000 0.000
9 1.000 1.000 0.983 0.811 0.425 0.115 0.013 0.000 0.000 0.000 0.000
10 1.000 1.000 0.994 0.902 0.586 0.212 0.034 0.002 0.000 0.000 0.000
11 1.000 1.000 0.998 0.956 0.732 0.345 0.078 0.006 0.000 0.000 0.000
12 1.000 1.000 1.000 0.983 0.846 0.500 0.154 0.017 0.000 0.000 0.000
13 1.000 1.000 1.000 0.994 0.922 0.655 0.268 0.044 0.002 0.000 0.000
14 1.000 1.000 1.000 0.998 0.966 0.788 0.414 0.098 0.006 0.000 0.000
15 1.000 1.000 1.000 1.000 0.987 0.885 0.575 0.189 0.017 0.000 0.000
16 1.000 1.000 1.000 1.000 0.996 0.946 0.726 0.323 0.047 0.000 0.000
17 1.000 1.000 1.000 1.000 0.999 0.978 0.846 0.488 0.109 0.002 0.000
18 1.000 1.000 1.000 1.000 1.000 0.993 0.926 0.659 0.220 0.009 0.000
19 1.000 1.000 1.000 1.000 1.000 0.998 0.971 0.807 0.383 0.033 0.001
20 1.000 1.000 1.000 1.000 1.000 1.000 0.991 0.910 0.579 0.098 0.007
21 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.967 0.766 0.236 0.034
22 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.991 0.902 0.463 0.127
23 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.973 0.729 0.358
24 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.928 0.723
25 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Table 6.3: Poisson Cumulative Distribution

The table below gives the probability of that a Poisson random variable X with mean
= λ is less than or equal to x. That is, the table gives

𝑟𝑒 ̵
𝑃(𝑋 ≤ 𝑥) = 𝜆
𝑟!

λ= 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.2 1.4 1.6 1.8
x= 0 0.9048 0.8187 0.7408 0.6703 0.6065 0.5488 0.4966 0.4493 0.4066 0.3679 0.3012 0.2466 0.2019 0.1653

1 0.9953 0.9825 0.9631 0.9384 0.9098 0.8781 0.8442 0.8088 0.7725 0.7358 0.6626 0.5918 0.5249 0.4628

2 0.9998 0.9989 0.9964 0.9921 0.9856 0.9769 0.9659 0.9526 0.9371 0.9197 0.8795 0.8335 0.7834 0.7306

3 1.0000 0.9999 0.9997 0.9992 0.9982 0.9966 0.9942 0.9909 0.9865 0.9810 0.9662 0.9463 0.9212 0.8913

4 1.0000 1.0000 1.0000 0.9999 0.9998 0.9996 0.9992 0.9986 0.9977 0.9963 0.9923 0.9857 0.9763 0.9636

5 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998 0.9997 0.9994 0.9985 0.9968 0.9940 0.9896

Bapatla College of Pharmacy Page 144


Probability

6 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9997 0.9994 0.9987 0.9974

7 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9997 0.9994

8 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999

9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

λ= 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.5 5.0 5.5

x=0 0.1353 0.1108 0.0907 0.0743 0.0608 0.0498 0.0408 0.0334 0.0273 0.0224 0.0183 0.0111 0.0067 0.0041

1 0.4060 0.3546 0.3084 0.2674 0.2311 0.1991 0.1712 0.1468 0.1257 0.1074 0.0916 0.0611 0.0404 0.0266

2 0.6767 0.6227 0.5697 0.5184 0.4695 0.4232 0.3799 0.3397 0.3027 0.2689 0.2381 0.1736 0.1247 0.0884

3 0.8571 0.8194 0.7787 0.7360 0.6919 0.6472 0.6025 0.5584 0.5152 0.4735 0.4335 0.3423 0.2650 0.2017

4 0.9473 0.9275 0.9041 0.8774 0.8477 0.8153 0.7806 0.7442 0.7064 0.6678 0.6288 0.5321 0.4405 0.3575

5 0.9834 0.9751 0.9643 0.9510 0.9349 0.9161 0.8946 0.8705 0.8441 0.8156 0.7851 0.7029 0.6160 0.5289

6 0.9955 0.9925 0.9884 0.9828 0.9756 0.9665 0.9554 0.9421 0.9267 0.9091 0.8893 0.8311 0.7622 0.6860

7 0.9989 0.9980 0.9967 0.9947 0.9919 0.9881 0.9832 0.9769 0.9692 0.9599 0.9489 0.9134 0.8666 0.8095

8 0.9998 0.9995 0.9991 0.9985 0.9976 0.9962 0.9943 0.9917 0.9883 0.9840 0.9786 0.9597 0.9319 0.8944

9 1.0000 0.9999 0.9998 0.9996 0.9993 0.9989 0.9982 0.9973 0.9960 0.9942 0.9919 0.9829 0.9682 0.9462

10 1.0000 1.0000 1.0000 0.9999 0.9998 0.9997 0.9995 0.9992 0.9987 0.9981 0.9972 0.9933 0.9863 0.9747

11 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 0.9998 0.9996 0.9994 0.9991 0.9976 0.9945 0.9890

12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 0.9998 0.9997 0.9992 0.9980 0.9955

13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9997 0.9993 0.9983

14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998 0.9994

15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998

16 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999

17 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

λ= 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 11.0 10.0 12.0 14.0 15.0

x=0 0.0025 0.0015 0.0009 0.0006 0.0003 0.0002 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

1 0.0174 0.0113 0.0073 0.0047 0.0030 0.0019 0.0012 0.0008 0.0005 0.0002 0.0005 0.0001 0.0000 0.0000

2 0.0620 0.0430 0.0296 0.0203 0.0138 0.0093 0.0062 0.0042 0.0028 0.0012 0.0028 0.0005 0.0001 0.0000

3 0.1512 0.1118 0.0818 0.0591 0.0424 0.0301 0.0212 0.0149 0.0103 0.0049 0.0103 0.0023 0.0005 0.0002

4 0.2851 0.2237 0.1730 0.1321 0.0996 0.0744 0.0550 0.0403 0.0293 0.0151 0.0293 0.0076 0.0018 0.0009

5 0.4457 0.3690 0.3007 0.2414 0.1912 0.1496 0.1157 0.0885 0.0671 0.0375 0.0671 0.0203 0.0055 0.0028

6 0.6063 0.5265 0.4497 0.3782 0.3134 0.2562 0.2068 0.1649 0.1301 0.0786 0.1301 0.0458 0.0142 0.0076

7 0.7440 0.6728 0.5987 0.5246 0.4530 0.3856 0.3239 0.2687 0.2202 0.1432 0.2202 0.0895 0.0316 0.0180

8 0.8472 0.7916 0.7291 0.6620 0.5925 0.5231 0.4557 0.3918 0.3328 0.2320 0.3328 0.1550 0.0621 0.0374

9 0.9161 0.8774 0.8305 0.7764 0.7166 0.6530 0.5874 0.5218 0.4579 0.3405 0.4579 0.2424 0.1094 0.0699

10 0.9574 0.9332 0.9015 0.8622 0.8159 0.7634 0.7060 0.6453 0.5830 0.4599 0.5830 0.3472 0.1757 0.1185

11 0.9799 0.9661 0.9467 0.9208 0.8881 0.8487 0.8030 0.7520 0.6968 0.5793 0.6968 0.4616 0.2600 0.1848

12 0.9912 0.9840 0.9730 0.9573 0.9362 0.9091 0.8758 0.8364 0.7916 0.6887 0.7916 0.5760 0.3585 0.2676

13 0.9964 0.9929 0.9872 0.9784 0.9658 0.9486 0.9261 0.8981 0.8645 0.7813 0.8645 0.6815 0.4644 0.3632

Bapatla College of Pharmacy Page 145


Probability

14 0.9986 0.9970 0.9943 0.9897 0.9827 0.9726 0.9585 0.9400 0.9165 0.8540 0.9165 0.7720 0.5704 0.4657

15 0.9995 0.9988 0.9976 0.9954 0.9918 0.9862 0.9780 0.9665 0.9513 0.9074 0.9513 0.8444 0.6694 0.5681

16 0.9998 0.9996 0.9990 0.9980 0.9963 0.9934 0.9889 0.9823 0.9730 0.9441 0.9730 0.8987 0.7559 0.6641

17 0.9999 0.9998 0.9996 0.9992 0.9984 0.9970 0.9947 0.9911 0.9857 0.9678 0.9857 0.9370 0.8272 0.7489

18 1.0000 0.9999 0.9999 0.9997 0.9993 0.9987 0.9976 0.9957 0.9928 0.9823 0.9928 0.9626 0.8826 0.8195

19 1.0000 1.0000 1.0000 0.9999 0.9997 0.9995 0.9989 0.9980 0.9965 0.9907 0.9965 0.9787 0.9235 0.8752

20 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998 0.9996 0.9991 0.9984 0.9953 0.9984 0.9884 0.9521 0.9170

21 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998 0.9996 0.9993 0.9977 0.9993 0.9939 0.9712 0.9469

22 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 0.9997 0.9990 0.9997 0.9970 0.9833 0.9673

23 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 0.9995 0.9999 0.9985 0.9907 0.9805

24 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 1.0000 0.9993 0.9950 0.9888

25 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 1.0000 0.9997 0.9974 0.9938

26 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9987 0.9967

27 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9994 0.9983

28 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9991

29 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9996

30 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998

31 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999

32 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

References:

1. De Muth JE. Basic statistics and pharmaceutical statistical applications. CRC


Press; 2014.

2. Bolton S, Bon C. Pharmaceutical statistics: Practical and clinical applications,


revised and expanded. CRC press; 2003.

3. Bhardwaj AN, Sharma K. Comparative study of various measures of


dispersion. Journal of Advances in mathematics. 2013;1(1).

4. Wessels P, Holz M, Erni F, Krummen K, Ogorka J. Statistical evaluation of


stability data of pharmaceutical products for specification setting. Drug
development and industrial pharmacy. 1997;23(5):427-39.

5. Bingham NH, Fry JM. Regression: Linear models in statistics. Springer


Science & Business Media; 2010.

Bapatla College of Pharmacy Page 146


Sampling

7. Sampling
A sample is meant to represent the population. The idea is that by studying a
properly selected sample, researchers can make valid inferences about the larger
population. Statistical methods are used to calculate how well the sample represents
the population and to estimate the margin of error and confidence levels.

7.1 Population and sample

The population refers to the entire set of individuals, items, or data points that are of
interest in a particular study. A population includes every possible member or unit
that fits the criteria being studied. Suppose a pharmaceutical company develops a new
drug intended to treat diabetes in adults aged 60-70 years. The population would
include all adults with diabetes hypertension in that age range who are eligible for the
drug.

Sampling refers to the process of selecting a subset from a larger population or batch
in order to make inferences about the entire population, batch, or process. Effective
sampling is a cornerstone of pharmaceutical research, as it ensures that results are
reliable, reproducible, and meaningful. In quality control, a sample could refer to a
subset of 6 tablets selected from a batch of 100,000 tablets for testing disintegration
time.

7.1.1.Differences between population and sample


The following key differences are observed in between population and sample
Characteristic Population Sample
Definition The entire group of individuals or A subset of the population
items of interest. selected for study.
Size Very large or even infinite Smaller than the population.
Purpose The population is the group on The sample is used to make
which conclusions are made. inferences about the population.
Representativeness Includes all members of the group It must be true representative of
and fully representative. the population
Cost & Time Costly and time-consuming. More cost-effective and faster.
Accuracy Provides exact information. Possibility of sampling error.

Bapatla College of Pharmacy Page 147


Sampling

7.2. Large and small size samples

The size of the sample is an essential factor in determining the reliability and validity
of conclusions drawn from the data. A large sample size refers to a study or research
that involves a large number of units from the population being studied. In most
studies, large refers to a sample size typically in the range of hundreds to thousands,
depending on the nature of the population. In a clinical trial evaluating a new drug, a
large sample size is chosen to ensure the study has enough power to detect any
differences between the drug and a placebo, a small sample size refers to studies that
involve fewer units, typically ranging from less than 30 to a few hundred. It may vary
based on the type of study, small sample sizes are common in preformulation studies,
early-phase clinical trials, pilot studies, or initial research where the main goal is to
get preliminary data.

Both large and small sample sizes have their place in pharmaceutical research. Large
sample sizes are crucial for ensuring robust, reliable, and generalizable results,
especially in pivotal clinical trials, while small sample sizes are typically used in
early-stage studies for safety, tolerability, or feasibility assessments. An appropriate
sample size is selected based on the research objectives, study phase, budget, and
ethical considerations.

7.2.1.Differences between population and sample:


The following key differences are observed in between large and small sample
Characteristic Large Sample Size Small Sample Size
Statistical Power High power Low power,
Precision Less precise estimation with
More precise estimation with
narrower confidence
narrower confidence intervals.
intervals.
Cost & Time Expensive and time-
Less expensive and faster
consuming
Representativeness More likely to be Less likely to be
representative of the representative of the
population. population.
Sampling Error Smaller Larger

Bapatla College of Pharmacy Page 148


Sampling

Ethical Involvement of larger numbers Smaller trials reduce exposure


Considerations of participants may raise but may not provide definitive
ethical concerns. answers.
Use in Research Often used in Phase III clinical Often used in exploratory or
trials Phase I or II clinical trials.

7.3. Key Factors to Consider in Pharmaceutical Sampling:

1. Statistical Power: The sample size needs to be large enough to detect


significant differences, but also feasible within budget and time constraints.
2. Risk of Contamination or Cross-Contamination: In pharmaceutical research,
samples must be taken with care to avoid contamination, especially in
manufacturing and clinical trials.
3. Sample Integrity: Proper storage, handling, and transport of samples are
critical to maintaining their integrity before analysis (e.g., temperature control
during drug stability testing).
4. Bias and Variability: Sampling must be done in a way that minimizes bias and
accounts for variability in the population or batch.
5. Regulatory Guidelines: Adherence to regulatory standards and pharmacopeia
is necessary for ensuring the research is credible and the products meet safety
standards.

7.4: Types of Sampling

Sampling techniques are broadly classified as probability and non probability


sampling techniques. In probability sampling, each member of the population has a
known (non-zero) chance of being selected. This allows for statistical inferences
about the population based on the sample. Probability sampling is ideal for
quantitative research to make inferences about the entire population. It’s more reliable
and generalizable but can be costly and time-consuming. In non-probability sampling,
all members of the population are not having a known or equal chance of being
selected. This approach can lead to bias and makes it harder to generalize the findings
to the larger population. Non-probability sampling is useful for qualitative research,
exploratory studies, or when time and resources are limited. However, it causes bias
and limits the ability to generalize findings to a larger population.

Bapatla College of Pharmacy Page 149


Sampling

Differences between probability and non probability sampling techniques


Feature Probability Sampling Non-Probability Sampling
Selection Process Random selection Non random selection
Bias Less bias More bias
Accuracy and
more accurate and reliable Less accurate
reliability
Generalizability Results can be generalized to Results can’t be generalized to
the population the population
Cost and Time More time-consuming and Less time-consuming and
costly cheaper
Data Analysis Easier to analyze with
Less rigorous analysis
statistical methods
Information Requires detailed population
Require subjective decisions
information
Application More often quantitative More often qualitative

Various probability and non probability sampling techniques are furnished in the
following flow chart.
Sampling Techniques

Probability Sampling Non Probability


Sampling

Simple random sampling Convenience Sampling

Systematic sampling Judgmental (Purposive) Sampling

Stratified sampling Quota Sampling

Cluster sampling Snowball Sampling

Multistage sampling Self-Selection (Volunteer) Samplin

Bapatla College of Pharmacy Page 150


Sampling

7.4.1 Probability sampling: Different types of probability sampling commonly used


in pharmaceutical research are discussed here.

7.4.1.1 Simple Random Sampling (SRS): In Simple Random Sampling, each


individual in the population has an equal chance of being selected for the study. This
technique is unbiased and representative of the larger population, provided the
population is well defined. If this method is applied to clinical trials, it might require
access to a comprehensive list of patients, which could be challenging in large-scale
studies.

Example: A pharmaceutical company wants to test a new medication for diabetes. The
population for the study is a group of patients with diagnosed diabetes, aged 40-60.
Using SRS, the researchers would randomly select a group of participants from a list
of all patients who meet the exclusion and inclusion criteria. Patients could be
randomly chosen by assigning numbers to each eligible patient and selecting a sample
using random number generators or a lottery system.

7.4.1.2 Systematic Sampling: Systematic Sampling involves selecting every nth


individual from a list or population after choosing a random starting point. Systematic
sampling is easy to implement, especially with a well-ordered list of individuals or
items. It’s less time-consuming than simple random sampling, especially with large
populations. If there is a hidden pattern in the list systematic sampling may introduce
bias.

Steps and Formula:

1. Determine the population size (N):


This is the total number of individuals or items in the population to be
studied.
2. Decide the sample size (n):
This is the number of individuals you want to sample.
3. Calculate the sampling interval (k):
The sampling interval k is the number of elements to skip between
selections. It is calculated using the formula:

Bapatla College of Pharmacy Page 151


Sampling

𝑁
𝑘=
𝑛

Where:

o N = Total population size


o n = Desired sample size

Round the result to the nearest whole number if necessary.

4. Randomly Select the Starting Point:

Select a random number between 1 and k. This will be the starting point in the
population list. If the random number generated is r, then start by selecting the
element in the r-th position.

5. Select the Sample:

After selecting the first individual (starting point), every k-th individual is selected.
The subsequent elements are:

r, r+k,r+2k,r+3k,…

Example: A pharmaceutical company is conducting a clinical trial on a new


formulation on ten target patients. The researchers decide to recruit participants from
a database of patients (100) who have previously sought treatment at a specific
hospital.

Population size (N) =100

Sample size (n) =10

Sampling interval (k) = 100/10=10

Select a random number between 1 and 10. After randomly selecting a starting point
(e.g., patient #5 on the list), the researchers would select every 10th patient on the list
to be included in the study (e.g., #5, #15, #25, etc.).

7.4.1.3 Stratified Sampling: In Stratified Sampling, the population is divided into


distinct subgroups, or strata, based on a specific characteristic (e.g., age, gender,
disease stage). A random sample is then taken from each stratum. This ensures that all

Bapatla College of Pharmacy Page 152


Sampling

subgroups are properly represented in the final sample. Stratified sampling improves
precision by ensuring that all relevant subgroups are represented in the sample, which
is especially important when those subgroups may respond differently to treatment.
This method requires knowledge of the strata and may involve additional planning
and coordination to implement.

Example: A clinical trial is being conducted to evaluate the effectiveness of a new


medication. The researchers want to ensure the sample includes adult and geriatric
patients, as the medication may have different effects on each group.

7.4.1.4 Cluster Sampling: Cluster Sampling involves dividing the population into
clusters (typically based on geographic location or other groupings), and then
randomly selecting a subset of clusters. Within those selected clusters, all individuals
or a random sample of individuals are used for further study. This method is cost-
effective for large-scale studies, particularly when the population is dispersed across a
large area. It can result in less precise estimates, especially if the selected clusters
differ significantly from each other.

Example: A pharmaceutical company wants to test the effectiveness of a new vaccine


in preventing viral infection in rural areas. Since the population is geographically
spread out, it would be expensive and time-consuming to sample individuals from
across the entire country. The company first divides the country into clusters based on
geographical regions (e.g., rural towns or villages). Then, they randomly select a few
towns and conduct the trial within these selected clusters, vaccinating all individuals
in the chosen towns who meet the study’s criteria.

7.4.1.5 Multistage Sampling: Multistage Sampling is a more complex form of cluster


sampling that involves sampling in multiple stages. The first stage involves selecting
clusters, and subsequent stages involve sampling within those clusters. This method is
useful for large, geographically spread-out populations and reduces logistical costs.
The complexity of the method may lead to a higher risk of sampling errors and
logistical challenges in coordinating across multiple stages.

Example: A pharmaceutical company is conducting a nationwide study to evaluate the


side effects of a new cancer treatment. The study needs to sample patients from

Bapatla College of Pharmacy Page 153


Sampling

multiple hospitals across the country. The researchers first select a random sample of
hospitals (clusters). Then, within each hospital, they randomly select a subset of
cancer patients to participate in the study.

7.4.2 Non Probability sampling: Different types of probability sampling commonly


used in pharmaceutical research are discussed here.

7.4.2.1: Convenience Sampling: Convenience sampling involves selecting


participants who are easiest to access or most readily available, rather than randomly
from the broader population. This technique is quick and inexpensive, ideal for
exploratory research or when there are logistical constraints and useful for small-scale
studies or qualitative research. The sample may not be representative of the larger
population, leading to limited generalizability of findings. The sample may be biased
toward people who are easy to recruit.

Example: A pharmaceutical company is conducting a pilot study to assess side effects


of a new drug. They may use convenience sampling by recruiting patients from a
particular hospital who are already receiving new drug.

7.4.2.2: Judgmental (Purposive) Sampling: In judgmental (purposive) sampling, the


researcher selects participants based on specific characteristics or qualities that are
deemed important for the study. This technique is common in qualitative research,
where the goal is to select individuals who can provide in-depth, relevant information
rather than a representative sample. Useful for obtaining insights from a specific
group with particular knowledge, experience, or characteristics and allows the
researcher to focus on people who are directly relevant to the study's objectives. The
selection of participants is based on the researcher’s judgment, which can lead to
biases and limited generalizability as the sample is not representative of the larger
population.

Example: In a drug development study for rare diseases, researchers might only
recruit patients who have been diagnosed with that rare disease, as these are the most
relevant to the study.

7.4.2.3: Quota Sampling: Quota sampling involves dividing the population into
subgroups (or quotas) based on certain characteristics (e.g., age, gender, disease

Bapatla College of Pharmacy Page 154


Sampling

stage). Researchers then sample participants from these quotas non-randomly until the
desired number of participants for each subgroup is reached. The advantages includes
that it ensures that specific subgroups are represented in the sample, faster than
probability sampling methods when quotas are clearly defined and it allows
researchers to ensure certain characteristics (e.g., age, gender) are proportionally
represented. The limitations are the selection within each subgroup is not random and
the results may be biased and not generalizable. Some subgroups may still be under-
represented or overlooked.

Example: In a study of a new antihyperlipedemic drug, researchers may want to


ensure that the sample includes males and females. Researchers could create quotas
for the gender and then select participants non-randomly from each group until the
required number of participants is reached.

7.4.2.4: Snowball Sampling: Snowball sampling is used when the population is hard
to reach or lacks a clear listing. It begins with a small number of participants who
meet the study criteria, and then those participants refer others who meet the criteria.
This referral process "snowballs," with each new participant providing contacts for
additional participants. This technique is useful for studying populations that are
difficult to access or identify, such as people with rare conditions or marginalized
groups. It may help find hard-to-reach participants who might not otherwise be
included in traditional studies. The sample may become homogenous, as participants
tend to refer others with similar characteristics. It may not be representative of the
general population, leading to poor generalizability. The process can perpetuate
existing social networks or groups, which may not be ideal for diverse perspectives.

Example: In researching a rare condition, researchers may start with a few patients
who have been diagnosed with the disease and then ask them to refer other patients
they know who also have the condition. This is especially useful for researching rare
diseases or drug side effects that may not be easily detectable in the general
population.

7.4.2.5: Self-Selection (Volunteer) Sampling: In self-selection sampling, participants


volunteer to be part of the study. This method is commonly used when the study is
advertised to the public, and individuals choose to participate based on their own

Bapatla College of Pharmacy Page 155


Sampling

willingness. Its quick and cost-effective.and may result in high participation rates if
the study is marketed well or provides incentives. This technique has limited
generalizability to the broader population due to the self-selection bias.

Example: A clinical trial investigating a new over-the-counter medication might


recruit volunteers from the general population through online surveys or public
advertisements.

Non-probability sampling methods are often used in the following scenarios in


pharmaceutical research:

1. Exploratory or qualitative research: Early-stage research aimed at


understanding patient experiences, drug side effects, or attitudes toward
treatments, where the aim is not to generalize findings to a larger population.
2. Pilot Studies: Before launching a large-scale clinical trial, researchers often
conduct small pilot studies to explore the feasibility of a new drug or treatment
and refine the study protocols.
3. Rare Diseases: For diseases that affect a very small number of people, it's
difficult to form a probability sample, so non-probability sampling like
snowball sampling is used to identify a relevant sample.
4. Formative or preclinical research: Understanding specific outcomes, reactions,
or side effects from small groups of patients, or gathering expert opinions
from healthcare professionals and researchers.
5. Health economics and market research: Non-probability sampling is often
used to gather qualitative data about patient preferences, medication
adherence, or responses to health interventions in market research studies.

Probability sampling methods are often used in the following scenarios in


pharmaceutical research:

1. Sampling for Drug Manufacturing:


o Raw Materials: Ensure the raw materials meet specifications (e.g.,
purity, potency, and quality) CMAs before manufacturing begins.
o In-Process Sampling: During production, samples are taken to monitor
critical quality attributes (CQAs) and critical process parameters CPPs

Bapatla College of Pharmacy Page 156


Sampling

o Final Product Testing: The final drug product is sampled to check its
stability, dosage uniformity, and other critical parameters, ensuring the
drug meets regulatory standards.
o Stability Studies: Samples from different batches are stored under
various conditions to assess stability of the drug over time, guiding
expiry date labelling.
2. Clinical Trial Sampling:
o Patient Recruitment: Ensures that a diverse set of patients are included,
which is important for understanding how the drug will work across
different populations.
o Blinding and Randomization: Random sampling is crucial in ensuring
that blinding (masking treatment allocation) and randomization is
carried out to avoid bias.
o Placebo-Controlled Trials: Samples are often divided into experimental
(drug) and control (placebo) groups for comparison.
3. Stability and Shelf-Life Testing:
o Sampling is performed at different time points (e.g., 1 month, 6
months, 12 months) to ensure the drug retains its potency and is safe
for consumption throughout its shelf life.
4. Regulatory Compliance:
o Sampling in pharmaceutical research is highly regulated by agencies
Sampling plans must often adhere to Good Manufacturing Practices
(GMP) and Good Clinical Practices (GCP) to ensue the sampling
methods is scientifically sound and reproducible.

7.5: Sample size


Sample size calculation is a critical step in pharmaceutical research and clinical
trials, as it ensures that the study has enough power to detect a significant effect, if
one exists, while minimizing the risk of Type I and Type II errors. The sample
size affects the reliability and generalizability of the results and is influenced by
factors such as the study design, effect size, variability, and significance level.

Key Concepts in Sample Size Calculation

Bapatla College of Pharmacy Page 157


Sampling

Alpha (α): The significance level typically set at 0.05 for a 95% confidence level. It
represents the probability of committing a Type I error (i.e., rejecting the null
hypothesis when it is true).

Beta (β): The probability of committing a Type II error (i.e., failing to reject the null
hypothesis when it is false). The power of the study is calculated as 1−β, and
researchers typically aim for a power of 80% or 90%.

Effect Size (d or δ): Effect size is the magnitude of the expected difference to
observe between the groups. A larger effect size requires a smaller sample to detect,
whereas a smaller effect size requires a larger sample.

Standard Deviation (σ): A higher variability requires a larger sample size to detect a
given effect.

Population Size (N): The total number of individuals in the population from which
the sample will be drawn.

When estimating a single mean (e.g., the average effect of a drug on a continuous
variable), the sample size formula is simpler.

Formula:

The formula for sample size estimation to estimate the population mean with a
specified confidence level and margin of error is:

𝑍𝛼
∗𝜎
𝑛= 2
𝐸

Where:

 n = sample size.
 Zα/2 = Z-score for the desired confidence level.
 σ = Standard deviation of the population.
 E = Margin of error (how precise you want the estimate to be).

Bapatla College of Pharmacy Page 158


Sampling

Example: Estimate the sample size required to estimate the mean blood pressure
reduction after administering a new drug, with a 95% confidence level, margin of
error of 2 mmHg, and assuming a standard deviation of 8 mmHg.

1. Zα/2=1.96Z (for 95% confidence).


2. σ=8mmHg.
3. E=2 mmHg.

Substitute into the formula:

n=(1.96x8/2)2=(15.68/2)2= (7.84)2 =61.47

Approximately 62 participants.

Software and Tools for Sample Size Calculation:

In pharmaceutical research, calculating the sample size manually can be complex,


especially for advanced designs. Therefore, researchers often use statistical software
tools like:

 G*Power
 **PASS (Power Analysis and Sample

7.6: Sampling Errors

Sampling errors occur when the sample selected does not perfectly represent the
entire population, leading to bias or inaccuracies in estimating parameters such as the
effect of a drug, the prevalence of a condition, or the safety profile of a treatment.
Understanding and minimizing sampling errors is crucial for ensuring that research
results are scientifically sound and applicable to the broader population.

Sampling errors generally fall into two broad categories: random errors and
systematic errors.

7.6.1. Random Sampling Error: Random sampling error occurs due to the inherent
variability in selecting a sample from a population. Even with proper random
sampling techniques, the sample may differ from the population by chance alone.

Bapatla College of Pharmacy Page 159


Sampling

These errors typically reduce the precision of the estimates but do not introduce
systematic bias.

Random errors can be minimised by


 Increasing sample size: Larger sample sizes generally reduce random
sampling errors by averaging out extreme values and providing a more
accurate estimate of population parameters.
 Using proper random sampling methods: Ensuring that every individual in the
population has an equal chance of being selected reduces the impact of
randomness on the sample.

Example: In a clinical trial testing a new medication, random sampling error might
result in a slightly higher or lower proportion of patients experiencing side effects
in the trial compared to the general population, purely due to random variation in
the sample.

7.6. 2. Systematic Sampling Error (Bias): Systematic sampling error occurs when
the method of selecting the sample introduces bias, causing the sample to
consistently overestimate or underestimate the true population characteristics.
This type of error is not due to random chance but due to flaws in the sampling
design or methodology. Systematic errors are more dangerous than random errors
because they bias the results in a consistent direction, which could lead to false
conclusions about drug safety or efficacy.

Types of Systematic Sampling Errors in Pharmaceutical Research:

1. Selection Bias:

This occurs when the sample is not representative of the population due to how
participants are selected. For example, if only patients from one type of healthcare
facility are recruited, the sample may not represent the broader population.

Example: A clinical trial that only recruits patients from large urban centers may
not reflect the experiences of patients in rural or underserved areas .

Bapatla College of Pharmacy Page 160


Sampling

2. Non-Response Bias:

When a significant proportion of the selected sample refuses to participate or


cannot be reached, and the non-respondents have different characteristics than the
responders, it can lead to bias.

Example: In a survey about adverse drug reactions, if patients who experienced


severe side effects are less likely to participate, the findings may underreport the
risks of the drug.

3. Exclusion Bias:

This occurs when certain subgroups of the population are deliberately or


unintentionally excluded from the sample.

Example: If pregnant women are excluded from a drug trial due to safety concerns,
the results may not be applicable to pregnant individuals.

4. Recall Bias:

In retrospective studies, where participants are asked to recall past events (e.g.,
side effects of a drug), their memories may be inaccurate or selective, leading to
systematic errors.

Example: Patients in a study on long-term drug use might overreport or underreport


side effects based on how they felt at the time of the study.

Minimizing Systematic Sampling Error:

o Randomized controlled trials (RCTs): Randomization helps prevent selection bias


by assigning participants to treatment groups randomly.
o Stratified sampling: Ensures that specific subgroups (e.g., age, gender,
comorbidities) are appropriately represented in the sample, minimizing the risk of
exclusion bias.
o Ensure diversity: Aiming for diverse and representative samples can prevent biases
related to geography, socioeconomic status, or ethnicity.
o Increase response rate: Using strategies like follow-up reminders, incentives, and
ensuring confidentiality can help improve participation and reduce non-response
bias.

Bapatla College of Pharmacy Page 161


Sampling

Minimizing Sampling Errors in Pharmaceutical Research

To reduce sampling errors and ensure valid and reliable results in pharmaceutical
research, the following steps can be taken:

1. Careful Study Design: Ensure the study design is robust, using appropriate
randomization, sampling methods, and inclusion/exclusion criteria.
2. Use of Proper Sampling Techniques:
o Random sampling: Ensures that every individual in the population has
an equal chance of being selected, reducing bias.
o Stratified sampling: Ensures that subgroups are adequately represented
in the sample.
3. Increase Sample Size: Larger samples reduce random error and improve the
precision of estimates, making the study more powerful.
4. Ensure a Comprehensive Sampling Frame: Ensure that the list or frame from
which participants are selected accurately represents the target population.
5. Minimize Non-Response: Encourage participation to reduce the risk of non-
response bias, which can distort findings.
6. Regular Monitoring and Auditing: Continuous oversight during the study can
help detect and address potential sources of error early.

References

1. Etikan I, Bala K. Combination of probability random sampling method with


non probability random sampling method (sampling versus sampling
methods). Biometrics & biostatistics international Journal. 2017;5(6):210-3.

2. Rahman MM. Sample size determination for survey research and non-
probability sampling techniques: A review and set of recommendations.
Journal of Entrepreneurship, Business and Economics. 2023;11(1):42-62.

3. Cochran WG. Sampling techniques. john wiley & sons; 1977.

4. Etikan I, Bala K. Sampling and sampling methods. Biometrics & Biostatistics


International Journal. 2017;5(6):00149.

5. Das BK, Jha DN, Sahu SK, Yadav AK, Raman RK, Kartikeyan M. Concept of
sampling methodologies and their applications. InConcept building in fisheries
data analysis 2022 (pp. 17-40). Singapore: Springer Nature Singapore.

6. Yang K, Banamah A. Quota sampling as an alternative to probability


sampling? An experimental study. Sociological research online.
2014;19(1):56-66.

7. Futri IN, Risfandy T, Ibrahim MH. Quota sampling method in online


household surveys. MethodsX. 2022;9:101877.

Bapatla College of Pharmacy Page 162


Sampling

8. Parker C, Scott S, Geddes A. Snowball sampling. SAGE research methods


foundations. 2019.

9. Etikan I, Alkassim R, Abubakar S. Comparision of snowball sampling and


sequential sampling technique. Biometrics and Biostatistics International
Journal. 2016;3(1):55.

10. Naderifar M, Goli H, Ghaljaie F. Snowball sampling: A purposeful method of


sampling in qualitative research. Strides in development of medical education.
2017;30;14(3).

11. Ruggles S. Sample designs and sampling errors. Historical Methods: A


Journal of Quantitative and Interdisciplinary History. 1995;28(1):40-6.

12. Chelton DB. Effects of sampling errors in statistical estimation. Deep Sea
Research Part A. Oceanographic Research Papers. 1983;30(10):1083-103.

Bapatla College of Pharmacy Page 163


Hypothesis Testing

8.Hypothesis Testing

Hypothesis testing is a statistical technique used to make inferences or derive


conclusions about a population by analysing sample data. It’s a core aspect of
inferential statistics and serves to evaluate the credibility of a statement or assumption
concerning a population parameter.

8.1 Key Concepts:

8.1.1.Null Hypothesis (H0): This refers to a proposition. The null hypothesis


asserts that thereis no effect, no variation, or no relationship between variables. It
posits that any noticeable differences in the sample data are due torandom variation
or chance. For instance, the null hypothesis could claim that the effectiveness of a
new drug is no different from that of a placebo.

H0: μ drug=μ placebo

The null hypothesis statement related to the antihypertensive effect observed from
a drug and placebo is expressed as the means the mean reduction in blood
pressure for patients using the drug (μ drug) is equal to the mean reduction in
blood pressure for patients using the placebo (μ placebo).

8.1.2. Alternative Hypothesis (H1or Ha): This is the hypothesis that suggests a
potential effect, difference, or relationship.

The alternative hypothesis represents that idea that there is an effect,suggesting


that the new drug is morepotent than the placebo.

H0: μ drug ≠μ placebo

This means the averagedecreases in blood pressure for patients using the drug (μ
drug) is not equal to the mean reduction in blood pressure for patients using the
placebo (μ placebo).

This implies that the average decrease in blood pressure for patients taking the
medicament (μ drug) not the same as the average decrease in blood pressure for
those using the placebo (μ placebo).

Bapatla College of Pharmacy Page 164


Hypothesis Testing

1. Test Statistic: This refers to a standardized value derived from sample data
during a hypothesis test. It helps determine whether to accept or reject the null
hypothesis. Examples of test statistics include the Z-score, T-score, chi-
squared value, among others.
2. Significance Level (α): This is the probability threshold used to assess whether
the test outcome is statistically significant.

The significance levelrepresents the likelihood of rejecting the null hypothesis when it
is actually true, also known as a Type I error. In differentterms, it reflects risk of
mistakenly concluding that there is an effect or difference when, in fact, there is none.

Common Values of α: The most common value used for α in pharmaceutical and
clinical research is 0.05, which means there is a 5% chance of committing a Type I
error.The results are considered statistically significant if the p-value is less than 0.05,
suggesting a less than 5% probability that the observed result is due to random
chance.In pharmaceutical hypothesis testing, the significance level is a critical
parameter for deciding whether the evidence from a clinical trial or experiment is
strong enough to reject the null hypothesis and claim that a medication has a
meaningful effect.

Significance level has the following implications in pharmaceutical hypothesis


testing:

 Control over Type I Errors: By establishing α level (eg. 0.05), researchers


control the probability of incorrectly concluding a treatment effect if there is
none.
 Clinical Trial Design: In clinical trials, choosing an appropriate α is essential
for balancing the risk of false positives and false negatives.
 Regulatory Requirements: Regulatory agencies often require statistical
significance at the 0.05 level (or sometimes even stricter levels like
0.01particularly when public safety is at risk.) for new drug approval.
 The significance level α is associated withType I errors (false positives), while
the test’s power is connected to Type II errors (false negatives). There is a
trade-off:

Bapatla College of Pharmacy Page 165


Hypothesis Testing

 Decreasing α lowers the risk the likelihood of Type I errors but raises the risk
of Type II errors (failing to identify a true effect).
 Increasing α makes it easier to identify a true effect (increasing power) but
also heightens the likelihood of a Type I error.

3. P-value: The p-value represents the probability of obtaining test results as


extreme as, or more extreme than, the observed results, assuming the
assumptionthe null hypothesis is true. A smaller p-value provides stronger
evidence against the null hypothesis.
4. Type I Error (False Positive): This happens when the null hypothesis is
rejected despite being true. The probability of committing a Type I error is
represented by α (the significance level).

A Type I Error would occur if, based on the sample data, the company rejected the
null hypothesis(i.e., concluded the new drug is more effective than the placebo), when
in fact, the drug has no effect. This means the company would wrongly believe that
the drug is more effective than the placebo, even though it not being the case. In a
pharmaceutical field, both type I and type II Error can lead to severe consequences,
such as:

 Prematurely releasing a drug to the market: If the company falsely concludes


that the drug is effective, they may proceed to market it, which could lead to
ineffective treatment for patients and potential harm.
 Wasted resources: The Company may invest further resources in marketing,
production, and distribution of a drug that does not actually improve health
outcomes.

Minimizing Type I Error:

 To decrease the likelihood of Type I errors, researchers may set a lower


significance level (e.g., 0.01 instead of 0.05), which makes it harder to reject
the null hypothesis.
 Expanding the sample size can help reduce standard error, leading to more
precise estimates and a lowering the risk of a Type I Error.

Bapatla College of Pharmacy Page 166


Hypothesis Testing

5. Type II Error (False Negative): This occurs when the null hypothesis is not
rejected when it is actually false. The probability of making a Type II error is
denoted by β.

Let's take a pharmaceutical example where a company is testing the effectiveness


of a new drug for lowering blood pressure. A Type II Error would happen if the
company fails to reject the null hypothesis(i.e., concludes that the new drug is not
much effective than the placebo), even though the drug is actually more effective.

Several factors can increase the probability of a Type II Error:

1. Small sample size: Smaller sample sizes lead to more variability and less
power to detect a true effect, raising the likelihood of a Type II error occur.
2. High variability in the data: If the data observed varies widely between
participants, it becomes harder to identify a meaningfulvariance.
3. Small effect size: when the true variation between the drug and placebo is
minimal, it will be harder for the test to detect the difference, especially
withsmall sample size.

Minimizing Type II Error:

To reduce the likelihood of Type II errors, researchers can:

 Increase the sample size: Larger sample sizes reduce the standard error and
improving the ability to identify small effects.
 Increase the significance level (α): Increasing α (e.g., ranging between 0.01 to
0.05) makes it easier to reject the null hypothesis, though it also increases the
risk of Type I errors.
 Use more precise measurements: Reducing variability in the data (e.g.,
through better measurement techniques) enhances the test’s power.
 Conduct a power analysis before experiment to ensure that study is designed
with enough power to detect an expected effect.

6. Power of a Test: The power of a statistical test is the probability that it


correctly rejects a false null hypothesis. It is defined as 1−β, where β is the
probability of making a Type II error.

Bapatla College of Pharmacy Page 167


Hypothesis Testing

In pharmaceutical hypothesis testing, power is a critical concept that helps assess the
study’s ability to identify a true genuine effect when it is present. The power of a test
is the probability that it will accurately reject the null hypothesis when a specific
alternative hypothesis is valid. In simpler terms, it’s the likelihood of identifying a
real treatment effect if it indeed exists.

The formula for calculating power is:

𝑝𝑜𝑤𝑒𝑟 = 1 − 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟(𝛽 )

Where:

 β (Beta) is likelihood of making a Type II error, which occurs when the test
fails to reject the null hypothesis even though the alternative hypothesis is true
(i.e., failing to identify a genuine effect).

Importance of Power in Pharmaceutical Testing

Power is essential in clinical and pharmaceutical research for several reasons:

1. Avoiding False Negatives: A low power increases the likelihood of a Type II


error (failing to identify a true effect). In drug development, a Type II error
could mean that a potentially effective treatment is wrongly deemed
ineffective, resulting in missed opportunities for patient benefit.
2. Determining Sample Size: Power is closely related to sample size. A study
with low power may need a larger sample size to identify an effect. In clinical
trials, the required power typically drives decisions about how many patients
need to be enrolled to ensure reliable results.
3. Regulatory Considerations: Regulatory agencies often specify the desired
power level (usually 80% or 90%) for clinical trials. A trial with insufficient
power may not meet regulatory requirements for drug approval.

Factors Affecting the Power of a Test

The power of a hypothesis test is impacted by several factors, all of which must be
considered during the design of pharmaceutical studies:

Bapatla College of Pharmacy Page 168


Hypothesis Testing

1.Sample Size (n):

Larger sample sizestypically enhance the power of a test. This is because increased
sample size reduces the standard error, making it easier to identify a true effect. If the
data is collected from more samples, then the more precise the estimate of the
treatment effect, improving the likelihood of rejecting the null hypothesis when it is
false.

2.Effect Size (Δ):

Effect sizerefers to the magnitude of the difference between the null hypothesis value
and the true value of the parameter under the alternative hypothesis (e.g., actual
treatment effect). Larger effects are easier to detect and thus increase the power of the
test. If a drug produces a large therapeutic effect, the power to detect this effect will
be higher than if the effect is small.

1. Significance Level (α):

Significance level (α) is the threshold at which the null hypothesis is


rejected. Power increases with α (e.g., 0.10 instead of 0.05), because a
higher α makes it easier to reject the null hypothesis. Pharmaceutical studies
typically use α = 0.05 to maintain a balance between detecting true effects
and avoiding false positives.

2. Variance (σ²):

The variance of the outcome variable (e.g., the variability in patient


responses) also influences power. Higher variance reduces the precision of
estimates, makes more difficult to detect a true effect. Reducing variability
through better study design or more controlled conditions can increase
power.

3. Study Design:

The choice of study design (e.g., parallel-group, crossover, or matched pairs


design) can also affect power. More efficient study designs typically require
fewer subjects to detect a given effect size, leading to increased power.

Bapatla College of Pharmacy Page 169


Hypothesis Testing

Power and Sample Size Calculations

Before conducting a pharmaceutical study, researchers usually perform sample


sizecalculations to determine the number of participants needed to achieve a desired
power (usually 80% or 90%). These calculations take consideration the effect size,
variability, significance level (α), and desired power.

Example:

If a pharmaceutical company wants to test whether a new drug lowers blood pressure
more effectively than a placebo, they may use a power analysis to estimate the
number of patients required to detect a statistically significant difference with high
power, given expected effect sizes, variability, and α level.

Standard Power Thresholds in Pharmaceutical Research

In pharmaceutical hypothesis testing, achieving adequate power is important to


ensuring the reliability of conclusions drawn from studies. The typical power values
used in pharmaceutical research are:

 80% Power: A commonly accepted standard, meaning there is an 80%


probability of detecting a true effect if it exists (or a 20% chance of making a
Type II error).
 90% Power: Sometimes used for more critical studies, particularly when the
consequences of missing a true effect (i.e., a Type II error) are severe, such as
in life-saving treatments.

Example: Power in a Clinical Trial

Imagine a pharmaceutical company is conducting a clinical trial to determine if a new


drug reduces the risk of heart attack compared to a placebo. The following parameters
are given:

 Effect Size: The expected difference in the reduction of heart attack risk
between the drug and placebo is 10%.
 Variance: The variability in heart attack risk reduction within each group is
assumed to be 5%.

Bapatla College of Pharmacy Page 170


Hypothesis Testing

 Significance Level (α): 0.05 (standard threshold).


 Desired Power: 80% (to detect the effect with 80% confidence).

Using these parameters, the company would calculate the minimum sample
sizeneeded to achieve an 80% power to identify the 10% difference in heart attack
risk reduction. If the required sample size is too large for practical reasons (e.g., cost,
time), the company may adjust other parameters (e.g., effect size, α) to ensure that the
study has sufficient power.

Formula for calculating sample size for a t-test with two independent samples:

2 + 𝑍𝛽 𝜎
𝑛=

Where:

 Zα/2 = Z-value corresponding to the chosen significance level (e.g., for α =


0.05, Zα/2=1.96)
 Zβ= Z-value corresponding to the chosen power (e.g., for 80% power,
Zβ=0.84
 σ2 = Variance of the population (or estimate of the variance)=0.05
 Δ = Expected difference between the groups (effect size)=0.1

n=2(1.96+0.84) X0.05/ (0.1)2

=2(2.8) X0.05/0.01

=0.28/0.01

=28

Software Tools:

There are several tools and software packages that can help with these calculations,
including:

 G*Power: free, widely used software for power analysis.


 R (pwr package): The R programming language has packages like "pwr" to
calculate power and sample size.
 PS Power and Sample Size Calculator: An online calculator for power and
sample size estimation.
 SAS, SPSS, and Stata: These statistical software packages also include built-in
functions for power analysis.

8.2. Steps in Hypothesis Testing:

Bapatla College of Pharmacy Page 171


Hypothesis Testing

1. State the Hypotheses:


o Null hypothesis (H0): The default assumption.
o Alternative hypothesis (H1 or Ha): The hypothesis that suggests an
effect or difference.
2. Choose the Significance Level (α):
o Commonly chosen as 0.05, but can be adjusted based on the field of
study.
3. Select the Appropriate Test:
o The selection of test is based on the type of data and the research
question. For instance:
 Z-test or T-test: Used for comparing means.
 Chi-square test: Applied for categorical data.
 ANOVA: For comparing multiple means.
4. Collect Data and Compute the Test Statistic:
o Collect the sample data and calculate the test statistic (e.g., Z-value, T-
value) according to the selected test.
5. Calculate the P-value:
o Using the test statistic and sample size, compute the p-value.
6. Make a Decision:
o If p≤α, reject the null hypothesis (H0).
o If p>α, fail to reject the null hypothesis (H0).
7. Draw a Conclusion:
o Interpret the results in the context of the research question. If the null
hypothesis is rejected, it suggests evidence for the alternative
hypothesis.

8.3Types of Tests:

1. One-tailed Test:

The alternative hypothesis specifies a direction (greater than or less than). It examines
the potential for an effect in a single direction.

Aone-tailed t-test is employed to assess whether drug A has a significantly greater


reduction in pain compared to drug B. If drug A does not show a significant

Bapatla College of Pharmacy Page 172


Hypothesis Testing

improvement, you would not reject the null hypothesis, as you're only testing for
greater effectiveness, not equivalence or lesser effectiveness.One-tailed tests are used
when you are confident in the direction of the effect (e.g., a new drug should be
better, not worse). One-tailed tests typically require smaller sample sizes to achieve
the same power because they focus on only one direction of the distribution,
increasing the likelihood of detecting an effect in that direction.

Example:A pharmaceutical company is testing a new vaccine and aims to know


whether it provides a higher rate of immunity than a placebo.

 Null Hypothesis (H₀): The new vaccine is no more effective than the placebo in
generating immunity (i.e., the immune response is the same or worse than the
placebo).
 Alternative Hypothesis (H₁): The new vaccine is more effective than the placebo
in generating immunity.

2. Two-tailed Test:
o The alternative hypothesis does not specify a direction (just a difference).
It tests for the possibility of an effect in either direction.

In pharmaceutical research, a two-tailed test is used when the hypothesis does not
predict the direction of the effect. It is appropriate when researchers are interested in
detecting differences in both directions, such as whether a new drug is either more or
less effective than the current treatment, or whether a drug has side effects that are
either worse or better than the alternative. A two-tailed test is typically used in cases
where the expectation is not directional or when both outcomes are of interest.Two-
tailed tests allow for an unbiased approach to evaluating the data, as they are used
when there is no preconceived idea about whether the new treatment will perform
better or worse than the existing treatment.

Example: A new vaccine is tested against placebo to find out if it provides immunity
against a particular virus.
 NullHypothesis (H₀): The new vaccine is as effective as the placebo in
providing immunity.

Bapatla College of Pharmacy Page 173


Hypothesis Testing

 AlternativeHypothesis (H₁): The new vaccine is different from the placebo in


terms of its effectiveness in providing immunity (it could be either more or
less effective).

References:
1. Pugh SL, Molinaro A. The nuts and bolts of hypothesis testing. Neuro-
Oncology Practice. 2016; 1;3(3):139-44.
2. Chakrabarty D. Measuremental Data: Seven Measures of Central Tendency.
International Journal of Electronics. 2021;8(1):15-24.
3. Parker C, Scott S, Geddes A. Snowball sampling. SAGE research methods
foundations. 2019.
4. Gogtay NJ, Thatte UM. Principles of correlation analysis. Journal of the
Association of Physicians of India. 2017;65(3):78-81.
5. Raftery AE, Gilks WR, Richardson S, Spiegelhalter D. Hypothesis testing and
model. Markov chain Monte Carlo in practice. 1995 Dec 1;1:165-87.
6. Klein JP, Moeschberger ML, Klein JP, Moeschberger ML. Hypothesis testing.
Survival analysis: techniques for censored and truncated data. 2003:201-42.

Bapatla College of Pharmacy Page 174


Parametric Tests

9. Parametric Tests
Statistical tests are used to analyze data and make inferences about populations based
on samples. In the field of statistics, parametric and nonparametric tests are two
different approaches used to analyze data and draw conclusions. Parametric and
nonparametric tests represent different approaches to statistical analysis. Parametric
tests make specific assumptions about the underlying population distribution, while
nonparametric tests do not rely on these assumptions. Understanding the differences
between these two types of tests is crucial for appropriate data analysis and drawing
accurate conclusions.

Parametric tests are statistical tests that make certain assumptions about the
underlying population from which the data are drawn. They are widely used in
statistical analysis due to their effectiveness and ability to provide rich, interpretable
results. Parametric tests are preferred if the data is better represented by mean. The
most common parametric tests are t-Test and Analysis of Variance (ANOVA). These
tests are mainly used for quantitative data and consist of the continuous variables.
Parametric tests generally have higher statistical power when the assumptions are met.
They can provide more precise estimates and narrower confidence intervals, making
them more sensitive to detecting small effects. It’s essential to check whether the data
meet the following necessary assumptions before applying these tests, as violations
can lead to incorrect conclusions.

9.1. Assumptions of Parametric Tests

1. Normality: The data should follow a normal distribution, especially for


smaller sample sizes. However, for larger sample sizes, parametric tests can
still perform well even if the data are not perfectly normal. Shapiro-Wilk
test or Kolmogorov-Smirnov test can be used to test if data follow a normal
distribution. Q-Q plots or histograms can be used for visual assessment of
normality.
2. Homogeneity of variance: The variance of the groups being compared
should be approximately equal. This assumption is critical for tests
comparing multiple groups, such as ANOVA. Levene’s test or Bartlett’s
test can be used to check for homogeneity of variance in ANOVA.

Bapatla College of Pharmacy Page 175


Parametric Tests

3. Interval or ratio scale data: Parametric tests generally require the data to be
measured on at least an interval scale, or ratio scale, where both the order
and the exact difference between values are meaningful.
4. Independence: The observations should be independent of each other,
meaning the data from one subject or group should not influence another.
5. Outliers : The sample data don’t contain any extreme outliers

The advantages and limitations of parametric tests

Advantages Disadvantages
More powerful than non-parametric tests Sensitive to violations of assumptions: If
when the assumptions are met, meaning the data is not normal or if variances are
they have a greater ability to detect a true not equal, the test results can be
effect. misleading.
Require larger sample sizes to reliably
Provide more detailed information, such
detect differences when assumptions are
as confidence intervals and effect sizes.
violated.

Choose between a z-test and a t-test by looking at the sample size and population
variance. It’s shown in the following fig 9.1.

Population Variance known


Yes No

Sample Size > 30 T-Test

Yes No

Z - Test T-Test
Z-Test

Bapatla College of Pharmacy Page 176


Parametric Tests

t test
The t-test is named after William Sealy Gosset’s Student’s t-distribution, a type of
inferential statistic test used to determine if there is a significant difference between
the means of two groups. It is often used when data is normally distributed and
population variance is unknown. This test is used in hypothesis testing to assess
whether the observed difference between the means of the two groups is statistically
significant or just due to random variation.

It is similar to the standard normal distribution (Z-distribution), but it has heavier


tails.

Student’s t Distribution is used when:

 The sample size is 30 or less than 30.

 The population standard deviation (σ) is unknown.

 The population distribution must be unimodal and skewed.

The t distribution is presented in fig 9.1.

Fig 9.1: t distribution

Properties of the t-Distribution


Parametric Tests

 The variable in t-distribution ranges from -∞ to +∞ (-∞ < t < +∞).

 t- distribution will be symmetric like the normal distribution if the power of t


is even in the probability density function(pdf).

 For large values of ν (i.e. increased sample size n); the t-distribution tends to a
standard normal distribution. This implies that for different ν values, the shape
of t-distribution also differs.

 The t-distribution is less peaked than the normal distribution at the center and
higher peaked in the tails. The value of y(peak height) attains highest at μ = 0
as one can observe the same in the above diagram.

 The mean of the distribution is equal to 0 for ν > 1 where ν = degrees of


freedom, otherwise undefined.

 The median and mode of the distribution is equal to 0.

 The variance is equal to ν / ν-2 for ν > 2 and ∞ for 2 < ν ≤ 4 otherwise
undefined.

Limitations of using a t-distribution

 Sensitivity to departure from normality: The t-distribution assumes normality


in the underlying population. When data deviates significantly from a normal
distribution, the t-distribution may introduce inaccuracies in statistical
inferences.
 Limited applicability for large samples: As sample sizes increase, the t-
distribution converges to the normal distribution. Therefore, for sufficiently
large samples and known population standard deviation, the normal
distribution is more appropriate.
 Impact of outliers and small sample sizes: The t-distribution is sensitive to
outliers, and its tails can be influenced by small sample sizes.
 Requires Random Sampling: The assumptions underlying the t-distribution,
such as random sampling and independence of observations, need to be met
for valid results.

Bapatla College of Pharmacy Page 178


Parametric Tests

t- distribution applications

1. Testing of hypothesis of the population mean: t-distributions are commonly


used in hypothesis tests regarding the population mean. This involves
assessing whether a sample mean is significantly different from a
hypothesized population mean.

2. Testing for the hypothesis of the difference between two means: t-tests can be
employed to examine if there is a significant difference between the means of
two independent samples. This can be done under the assumption of equal
variances or when variances are unequal. In scenarios where samples are not
independent, such as paired or dependent samples.

3. Testing for the Hypothesis about the coefficient of correlation t-distributions


play a role in hypothesis testing related to correlation coefficients. This
includes situations where the population correlation coefficient is assumed to
be zero (ρ=0) or when testing for a non-zero correlation coefficient (ρ≠0).

Assumptions in T-test

 Independence: The observations within each group must be independent of


each other. This means that the value of one observation should not influence
the value of another observation.
 Normality: The data within each group should be approximately normally
distributed i.e the distribution of the data within each group being compared
should resemble a normal (bell-shaped) distribution. This assumption is
crucial for small sample sizes (n < 30).

 Homogeneity of Variances (for independent samples t-test): The variances of


the two groups being compared should be equal. This assumption ensures that
the groups have a similar spread of values. Unequal variances can affect the
standard error of the difference between means and, consequently, the t-
statistic.

Bapatla College of Pharmacy Page 179


Parametric Tests

 Absence of Outliers: There should be no extreme outliers in the data as


outliers can disproportionately influence the results, especially when sample
sizes are small.

Types of t-tests

There are different types of t-tests depending on the research question and the design
of the study:

1. one-sample t-test

This test compares the mean of a sample to a known value (often the population
mean).

 Null hypothesis (H₀): The sample mean is equal to the population mean.
 Alternative hypothesis (H₁): The sample mean is different from the population
mean.

2. Independent samples t-test

This test compares the means of two independent groups.

 Null hypothesis (H₀): The means of the two groups are equal.
 Alternative hypothesis (H₁): The means of the two groups are not equal.

3. Paired samples t-test (dependent samples t-test)

This test is used when the two groups being compared are related, such as
measurements taken from the same subjects at different times (before and after
treatment).

 Null hypothesis (H₀): The mean difference between paired observations is


zero.
 Alternative hypothesis (H₁): The mean difference is not zero.

Bapatla College of Pharmacy Page 180


Parametric Tests

Comparison of different t tests

One-Sample t-Test Independent Samples


Paired Samples t-Test
t-Test
Compare one sample mean Compare means of Compare means from two
to a known value or two independent related groups (e.g., before vs.
population mean groups after treatment)
One sample from one Two independent Two related measurements
group groups (same individuals)
Normality of data in the normality of both
Normality of differences, paired
sample groups, equal
nature of data
variances

9.1.1. One-Sample t-Test

In pharmaceutical research, the one-sample t-test is commonly used to compare the


mean of a sample to a known or hypothesized value, often to assess whether a drug,
treatment, or process meets specific quality or efficacy standards.

Steps to perform a one-sample t-test:

Step 1: Define the Hypotheses

 Null hypothesis (H₀): The sample mean is equal to the hypothesized


population mean
 Alternative hypothesis (H₁): The sample mean is different from the
hypothesized population mean (two-tailed test) or greater/less than the
population mean (one-tailed test).

Step 2: Choose the Significance Level (α)

 Typically, in pharmaceutical research, a significance level of 0.05 is used.


This means that there is a 5% risk of rejecting the null hypothesis when it is
actually true.

Bapatla College of Pharmacy Page 181


Parametric Tests

Step 3: Calculate the t-Statistic

The formula for the t-statistic in a one-sample t-test is:

t=Xˉ−μ/s /n

Where:

 Xˉ = Sample mean (e.g., the mean dosage of active ingredient in the tablets).
 μ = Population mean (e.g., the target dosage or potency).
 s = Sample standard deviation (e.g., variability in the dosage among the
tablets).
 n = Sample size (e.g., the number of tablets tested).

Step 4: Determine Degrees of Freedom (df)

The degree of freedom for a one-sample t-test is calculated as: df=n−1

Where n is the sample size.

Step 5: Compare the t-Statistic to the Critical Value

 Using a t-distribution table, find the critical t-value based on the selected
significance level (α) and degrees of freedom (df).
 If the absolute value of the calculated t-statistic is greater than the critical
value, reject the null hypothesis.
 If the absolute value of the t-statistic is smaller than the critical value, fail to
reject the null hypothesis.

Step 6: Conclusion

Based on the comparison:

 If H₀ is rejected: The sample mean is significantly different from the


hypothesized population mean.
 If H₀ is not rejected: There is no significant difference between the sample
mean and the hypothesized population mean.

Bapatla College of Pharmacy Page 182


Parametric Tests

Applications of one-sample t-test in pharmaceutical research:

1. Quality Control: Ensuring dosage form meets the required specifications.


2. Clinical Trials: Testing the efficacy of a drug compared to a hypothesized
treatment effect.
3. Bioequivalence Studies: Comparing generic drugs' bioavailability against
reference standards.

Example: A pharmaceutical company manufactures 100 mg tablets of a specific drug,


and intended t to check whether the mean weight of a sample of 20 tablets is
statistically different from 100 mg. 20 tablets were randomly selected and find that the
sample mean weight is 98 mg, with a sample standard deviation of 2 mg.

Step 1: Define the hypotheses

 Null hypothesis (H₀): The mean weight is 100 mg (μ=100).


 Alternative hypothesis (H₁): The mean weight is not 100 mg (μ≠100).

Step 2: Choose the significance level (α)

 Use a significance level of 0.05.

Step 3: Calculate the t-statistic

Using the formula for the t-statistic:

t=│Xˉ−μ │s /n

│98−100│/2/20=−2/2/4.47=2/0.447=4.474

Step 4: Determine the degrees of freedom

The degrees of freedom (df) is n−1=20−1=19.

Step 5: Compare to the critical value

For α = 0.05 and df = 19, the critical t-value (for a two-tailed test) is approximately
2.093 (from the t-distribution table).

Bapatla College of Pharmacy Page 183


Parametric Tests

Step 6: Conclusion

Since the calculated t-statistic is more than the critical value, we reject the null
hypothesis. This indicates that the mean weight of the sample is significantly different
from the target dosage of 100 mg.

9.1.2. t-test for Two Independent Samples:

This test is used to compare the means of two independent groups to see if there is
evidence that the associated population means are significantly different from each
other. In pharmaceutical research, comparing the effects of two different treatments,
drugs, or therapies often involves determining whether there is a significant difference
between the means of two independent groups.

Key assumptions for the t-test for two independent samples:

1. Independence of samples: The two samples must be independent of each


other.
2. Normality: Each of the two populations should follow a normal distribution.
For large samples (typically n > 30), the central limit theorem can help
mitigate the need for perfect normality.
3. Equality of variances (optional): The t-test assumes that the two populations
have equal variances, although there is a variant of the test that does not
require this assumption. If variances are unequal, Welch’s t-test should be
used.

Hypotheses for the Test:

 Null hypothesis (H₀): There is no difference in the means of the two


populations (i.e., μ1=μ2).
 Alternative hypothesis (H₁): There is a difference in the means of the two
populations (i.e., μ1≠μ2or a specific directional difference (e.g., μ1>μ2).

Formula for the t-test:

The formula for the t-statistic for two independent samples is:

t= (Xˉ1−Xˉ2 ) / sp(√1/n1 + 1/n2 )

Bapatla College of Pharmacy Page 184


Parametric Tests

sp = √ (n1-1)s12 + (n2-1)s22 / (n1+n2-2)

Where:

 Xˉ1, Xˉ2are the sample means of the two groups.


 s12, s22 are the sample variances of the two groups.
 n1, n2 are the sample sizes of the two groups.

The degrees of freedom (df) for the two-sample t-test can be calculated as:

df=n1+n2−2

Example: The body fat percentage data observed from different gender is furnished
below. Test the difference at 0.05 level of significance.

Gender Sample size Average Standard deviation


Women 10 22 5
Men 13 15 6

Null hypothesis is that the underlying population means are the same. The null
hypothesis is written as Ho: Xˉ1= Xˉ2

Alternative hypothesis is that the underlying population means are not same. The
alternative hypothesis is written as H1: Xˉ1≠ Xˉ2

Calculation of test statistic

sp = √ (10-1)52 + (13-1)62 / (10+13-2)

√ (9)25 + (12)36 / (23-2)

√ 225 + 432 / 21

√ 657 / 21

√ 657 / 21

√ 31.2857

5.59

Bapatla College of Pharmacy Page 185


Parametric Tests

t= (Xˉ1−Xˉ2 ) / sp(√1/n1 + 1/n2 )

t= (22−15) / 5.59(√1/10 + 1/13)

7/ 5.59(√1/10 + 1/13)

7/ 5.59(√0.1 + 0.077)

7/ 5.59(√0.177

7/ 5.59(√0.177

7/5.59x0.42

7/2.35

2.978

df=n1+n2−2

df=10+13−2

=21

The t value with α = 0.05 and 21 degrees of freedom is 2.080 (Table 9.1).

We compare the value of our statistic (2.978) to the t value. Since 2.978 > 2.080, we
reject the null hypothesis that the mean body fat for men and women are equal, and
conclude that we have evidence body fat in the population is different between men
and women.

The assumption of equal variances in the two groups being compared may not hold.
When this assumption is violated, the Welch’s t-test is often the preferred method for
comparing the means of two independent groups. Welch’s t-test is a variation of the
standard independent samples t-test that does not assume equal variances. The degrees
of freedom is computed with the following formulae

t= (Xˉ1−Xˉ2)/s12/n1+s22/n2

df=(s12/n1+s22/n2)2/(s12/n1)2/n1−1+(s22/n2)2/n2−1

Example: compare the effect of Drug A and Drug B on blood pressure reduction,
suspect that the variances in response to the drugs may differ.

Bapatla College of Pharmacy Page 186


Parametric Tests

Drug Sample size Average Standard deviation


A 30 10 5
B 30 8 8

Null hypothesis is that the underlying population means are the same. The null
hypothesis is written as Ho: Xˉ1= Xˉ2

Alternative hypothesis is that the underlying population means are not same. The
alternative hypothesis is written as H1: Xˉ1≠ Xˉ2

Calculation of test statistic

t== (Xˉ1−Xˉ2)/s12/n1+s22/n2

= (10−8)/52/30+82/30

2/25/30+64/30

2/0.833+2.133

2/0.833+2.133

2/2.966

2/1.72

1.16

df=(s12/n1+s22/n2)2/(s12/n1)2/n1−1+(s22/n2)2/n2−1

= (52/30+82/30)2/(52/30)2/30−1+(82/30)2/30−1

= (25/30+64/30)2/ (25/30)2/29+ (64/30)2/29

= (0.833+2.133)2/ (0.833)2/29+ (2.133)2/29

= (2.966)2/ (0.694/29+ (4.55/29)

= (8.797)/ (0.0239+ 0.157)

= 8.797/ 0.1798

= 8.797/ 0.1798

=48.93

Bapatla College of Pharmacy Page 187


Parametric Tests

The critical t-value for a significance level of 0.05 and 49 degrees of freedom using a
t-distribution calculator is 2.0096.

We compare the value of our statistic (48.93) to the t value. Since 48.93 > 2.0096, we
reject the null hypothesis that the mean blood pressure reduction observed from drug
A and drug B are equal, and conclude that we have evidence in the blood pressure
reduction is different between drug A and drug B .

9.1.3. Paired t-test:

The paired t-test is a statistical method used to compare the means of two related (or
dependent) groups. It is commonly used when the same subjects are measured under
two different conditions, or when measurements are taken at two different times. This
test is typically used to assess whether there is a significant difference between two
sets of paired data.

Paired t-test is used in the following conditions

 Before and After Studies: For example, measuring the blood pressure of
patients before and after treatment with a drug.
 Matched Pairs: When individuals are matched based on certain characteristics
and then compared on some outcome measure, like comparing the efficacy of
two treatments on patients with similar baseline conditions.
 Repeated Measures: For example, testing a group of patients’ blood sugar
levels at two different time points.

Paired t-test Hypotheses:

o Null hypothesis (H₀): There is no difference in the means of the paired


groups. Differences in mean=0.
o Alternative hypothesis (H₁): There is a significant difference in the
means of the paired groups, μdiff≠0 (two-tailed) or μ diff>0 (one-tailed
for positive difference) or μ diff<0 (one-tailed for negative difference).

Bapatla College of Pharmacy Page 188


Parametric Tests

Formula for the Paired t-test:


To conduct a paired t-test, the following steps are taken:
1. Calculate the difference between each pair of measurements (i.e., subtract one
observation from the other).
Let:
o Di=Xi1−Xi2, where Xi1is the first observation and Xi2is the second
observation in each pair.
2. Calculate the sample mean (Dˉ) and sample standard deviation of the
differences.
3. Calculate the t-statistic using the formula:
t=Dˉ/sD/n
Where:

o Dˉ is the mean of the differences.


o sD_is the standard deviation of the differences.
o n is the number of pairs.

4. The degrees of freedom for the paired t-test are given by:
df=n−1

Where n is the number of paired observations (i.e., the number of subjects or paired
measurements).
Example: The blood glucose levels observed from six randomly selected patients
before and after administration of a drug is furnished below. Test the significance of
difference in blood glucose levels due to drug
Subject Blood glucose level (mg/dl)
Before treatment After treatment
1 170 150
2 175 165
3 160 150
4 150 135
5 180 170
6 185 160

Bapatla College of Pharmacy Page 189


Parametric Tests

 Null hypothesis (H₀): There is no difference in the means of the blood glucose
levels. Differences in mean=0.
 Alternative hypothesis (H₁): There is a significant difference in the means of
the blood glucose levels, μdiff≠0 (two-tailed)

Calculate the t-statistic

Subject Blood glucose level (mg/dl)


Before treatment After treatment Difference
1 170 150 20
2 175 165 10
3 160 150 10
4 150 135 15
5 180 170 10
6 185 160 25
Sum 90
Mean 15

Standard deviation of difference in blood glucose level = 6.32

t=Dˉ/sD/n

=15/6.32/6

15/6.32/2.449

15/2.58

5.81

Degree of freedom= n-1=6-1=5

The critical t-value for a significance level of 0.05 and 5 degrees of freedom using a t-
distribution table is 2.571.

We compare the value of our statistic (5.81) to the t value. Since 2.571 > 2.571, we
reject the null hypothesis that the blood glucose level observed after drug
administration is less than the blood glucose level observed before drug

Bapatla College of Pharmacy Page 190


Parametric Tests

administration , and conclude that we have evidence in the blood glucose level is
different due to drug administration .

9.2 ANOVA

ANOVA (Analysis of Variance) is a statistical method used to compare the means of


three or more groups to determine if at least one group mean is significantly different
from the others. It helps in analyzing the impact of one or more independent variables
on a dependent variable. The Null hypothesis (H₀) states that all group means are
equal (no difference between the groups) and alternative hypothesis (H₁) states that at
least one group mean is different from the others.

The following assumptions are involved

 Independence of observations.
 Each group should follow a normal distribution.
 The variance within each group should be approximately equal.

The F-distribution is a continuous probability distribution that is used primarily in the


context of ANOVA. It is the distribution of the ratio of two independent chi-squared
variables divided by their respective degrees of freedom. It is having following
characteristics.

 Shape: The F-distribution is positively skewed. As the degrees of freedom


increase, the distribution becomes more symmetrical.
 Domain: The F-distribution only takes positive values, starting at 0 and
extending towards infinity.
 Degrees of Freedom (df): The F-distribution is defined by two sets of degrees
of freedom:
o df1 (numerator degrees of freedom): Corresponds to the between-
group variance (related to the number of groups being compared).
o df2 (denominator degrees of freedom): Corresponds to the within-
group variance (related to the total number of data points and the
number of groups).

Bapatla College of Pharmacy Page 191


Parametric Tests

 Critical Region: The area under the curve to the right of a specific F-value (the
critical value) represents the rejection region, where the null hypothesis would
be rejected.

F-distribution tables typically provide critical values of F for various degrees of


freedom for the numerator (df₁) and the denominator (df₂) at different significance
levels (α, such as 0.05, 0.01). These tables allow quickly determine whether an
observed F-statistic is significant without computing a p-value directly.

Types of ANOVA

 One-Way ANOVA
One-way ANOVA is used to compare the means of three or more groups
based on a single independent variable. It shows if there is a significant
difference among the group means.
o Formula:

F= SSB/df Between/SSW/ df within

 Example: A researcher tests three different antidiabetic drugs on blood


glucose. The independent variable is the antidiabetic drug, and the dependent
variable is the blood glucose.
 Two-Way ANOVA
Two-way ANOVA is used when there are two independent variables, allowing
researchers to explore individual and interactive effects.
o Formula:
 The formula involves calculating three F-ratios: one for each
independent factor and one for their interaction.
o Example: A researcher examines how different tablet production
techniques and formulation ingredients impact tablet performance.
 Repeated Measures ANOVA
This type is used when the same subjects are tested under different conditions
over time, controlling individual variability.

Bapatla College of Pharmacy Page 192


Parametric Tests

o Example: A psychologist tests stress levels in patients before, during,


and after a treatment. Repeated measures ANOVA identify changes in
stress over time.
 MANOVA (Multivariate Analysis of Variance)
MANOVA is an extension of ANOVA that handles multiple dependent
variables, analyzing group differences across several outcomes.
o Example: A company evaluates the effect of PH on both solubility and
stability.

Advantages and Limitations of ANOVA

Advantages:

 ANOVA allows comparing more than two groups in a single test.


 By testing all groups together, it minimizes the likelihood of finding false
significance.

Limitations:

 Assumes normal distribution, homogeneity of variances, and independent


observations.
 ANOVA shows if a difference exists but doesn’t specify which groups differ
significantly without additional post-hoc tests.

One-Way ANOVA
A one-way ANOVA (Analysis of Variance) is used to compare the means of three or
more groups to determine if there is a statistically significant difference between
them.

Example: A clinical trial is conducted with three groups of patients receiving different
dosages of the antihypertensive drug (50,100 and 150 mg). Blood pressure is
measured before and after taking the drug for a specific period. The change in blood
pressure (reduction) for each patient is recorded. Does the dosage of a certain
pharmaceutical drug affect the reduction in blood pressure in patients?

Bapatla College of Pharmacy Page 193


Parametric Tests

Subject Blood pressure reduction


50 mg dose 100 mg dose 150 mg dose
1 10 18 22
2 12 17 21
3 14 16 23
4 13 18 24
5 14 17 25
6 15 19 23
Sum 78 105 138
Grand mean 321

Solution: The data contains one dependent variable (blood pressure reduction) and
one independent variable (dose) with three levels. So the data can be treated
statistically with one way ANOVA.

Null hypothesis: The difference in mean blood pressure reduction is statistically


not significant.

Alternative hypothesis: The difference in mean blood pressure reduction observed


at least from one group is statistically significant.

Level of significance: 0.05

Calculation of test statistic

Correction term (C.T)=(X)2/N=)=(321)2/18=5724.5

X2=100+144+196+169+196+225+324+289+256+324+289+361+484+441+529+57
6+625+529=6057

Total Sum of Squares=TSS=X2-C.T=6057-5724.5=332.5

Between Treatment sum of Squares=BSS= STi2/Ni-C.T

Where Ti is the total sum of data on ith treatment group

(78)2/6+ (105)2/6+(138)2/6-5724.5

6084+11025+19044/6-5724.5

Bapatla College of Pharmacy Page 194


Parametric Tests

=6025.5-5724.5=301

Within Treatment Sum of Squares=WSS=TSS-BSS=332.5-301=31.5

ANOVA Table

Source Sum of squares Df Mean squares F 2,15


Between groups 301 (C-1)=2 150.5 71.67
Within groups 31.5 (N-C)=15 2.1
Total 332.5 (N-1)=17 23

Degree of freedom= (C-1), N-C 2, 15

Where C= Number of columns/ groups

N= Total number of observations

Critical value: The table value for the degree of freedom 2,15 at 0.05 level of
significance is 3.68( table 10.2)

Inference: The calculated value is greater than the table value and hence null
hypothesis is rejected.

Conclusion: The dose of a pharmaceutical drug affects the reduction in blood


pressure in patients

Two way ANOVA

A two-way ANOVA is used to determine whether or not there is a statistically


significant difference between the means of three or more independent groups that
have been split on two variables. Two-way ANOVA is used to know how two
factors affect a response variable and whether or not there is an interaction effect
between the two factors on the response variable.

For example, suppose a researcher wants to explore how PH and surfactant affect
drug solubility. Then experiments are conducted under different conditions for PH
and surfactant and record the drug solubility.

Bapatla College of Pharmacy Page 195


Parametric Tests

In this case, we have the following variables:

 Response variable: solubility


 Factors: PH and surfactant

And we would like to answer the following questions:

 Does PH affect solubility?


 Does surfactant affect solubility?
 Is there an interaction effect between solubility PH and surfactant? (e.g. the
effect that surfactant has on the solubility is dependent on PH )

For the results of a two-way ANOVA to be valid, the following assumptions


should be met:

 Normality – The response variable is approximately normally distributed for


each group.
 Equal Variances – The variances for each group should be roughly equal.
 Independence – The observations in each group are independent of each other
and the observations within groups were obtained by a random sample.

Example: The potency results of 4 batches of tablets packed in three different


kinds of packing material is represented in the following table. Test the influence
of batch and packing material on potency.

Batch Packing material


A B C
1 98 102 92
2 95 107 102
3 104 106 108
4 93 98 94
Solution; The data contains one dependent variable (potency) and two
independent variables (batch and packing material). The data is of continuous
variable and assumed to follow normal distribution. So the data can be treated
statistically by employing two way ANOVA.

Bapatla College of Pharmacy Page 196


Parametric Tests

Null hypothesis: 1.The difference in mean potency observed from different


packing materials is statistically not significant.

2.The difference in mean potency observed from different batches is statistically


not significant.

Alternative hypothesis: 1. The difference in mean potency observed from different


packing materials is statistically significant.

2. The difference in mean potency observed from different batches is statistically


significant.

Level of significance 0.05

Calculation of test statistic

Batch Packing material


A B C Total Mean
1 98 102 92 292 97.33
2 95 107 102 304 101.33
3 104 106 108 318 106
4 93 98 94 285 95
Total 390 413 396
Mean 97.5 103.25 99
Correction term (C.T) =(X)2/N=)=(1199)2/12=119800.08

X2=982+952+1042+932+ …………….942

9604+9025+10816+8649+10404+11449+11236+9604+8464+10404+11664+8836=
120155

Total Sum of Squares=TSS=X2-C.T=120155-119800.08=354.92

Sum of squares of columns (between packages) =Ci2/Ni-C.T

Where Ci is the total sum of data on ith group

3902/4+4132/4+3962/4-119800.08

=152100+170569+156816/4-119800.08

71.17

Bapatla College of Pharmacy Page 197


Parametric Tests

Sum of squares of rows (between batches) =Ri2/Ni-C.T

Where Ri is the total sum of data on ith row

2922/3+3042/3+3182/3+2852/3-119800.08

85264+92416+101124+81225/3-119800.08

=120009.6667-119800.08

=209.587

Sum of squares of error= Total Sum of Squares- Sum of squares of columns- Sum
of squares of rows

354.92-71.17-209.587=74.16

ANOVA Table

Source Sum of Mean F


Df
squares squares
Between groups(packing 71.17 35.58 F2.6=2.878
(C-1)=2
material)
Between rows(batches) 69.86 F3.6=5.65
209.587 (R-1)=3

Sum of squares of error 74.16 (C-1) (R- 12.36


1)=6
Critical value: 1.The table value for the degree of freedom2.6 and 0.05 level of
significance is 5.14

2. 1.The table value for the degree of freedom3.6 and 0.05 level of significance is
4.76

Inference: The statement 1 of null hypothesis is accepted as the calculated value is


less than table value. The statement 2 of null hypothesis is rejected as the
calculated value is greater than table value.

Conclusion: potency of the formulation is influenced by batches and it’s not


dependent on packing material.

Bapatla College of Pharmacy Page 198


Parametric Tests
Parametric Tests
Parametric Tests

References

1. De Muth JE. Overview of biostatistics used in clinical research.


American Journal of Health-System Pharmacy. 2009;66(1):70-81.
2. Rao PS, Richard J. Introduction to biostatistics and research methods.
PHI Learning Pvt. Ltd.; 2012.
3. Mishra P, Pandey CM, Singh U, Keshri A, Sabaretnam M. Selection of
appropriate statistical methods for data analysis. Annals of cardiac
anaesthesia. 2019;22(3):297-301.
4. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 4, statistical
inference techniques in hypothesis testing. Annals of emergency
medicine. 1990;19(7):820-5.

Bapatla College of Pharmacy Page 201


Non-Parametric Tests

10.1. Introduction to Non – Parametric tests

A parametric statistical test is a type of test that relies on specific assumptions about
the parameters of the population from which the sample is drawn. These assumptions
include:

 Observations must be independent of one another.


 The data must come from populations that follow a normal distribution.
 The populations must have equal variances.
 The variables must be measured on at least an interval scale.

In contrast, a non-parametric statistical test does not make assumptions about the
parameters of the population from which the sample is derived.

 It does not require the stringent measurement levels necessary for parametric
tests.
 Many non-parametric tests are suitable for ordinal data, and some can be used
with nominal data.
 These tests provide P-values but do not report confidence intervals.
 Non-parametric tests are more flexible and robust since they make fewer
assumptions, though they tend to have less power when the data would be
more appropriately analyzed using parametric tests.
 It is not required for the sample to follow a Gaussian distribution.

Most statistical tests are based on the assumption that the sample follows a known
distribution, such as normal or binomial. If the sample does not conform to these
distributions, the results may deviate from the true values. In such cases, particularly
with non-Gaussian distributions, non-parametric methods are employed. The
commonly used methods for testing the means of quantitative variables generally
assume that the data come from a normal distribution. However, in practice, it is rare
for a distribution to be perfectly normal. Fortunately, tests like the one-sample and
two-sample t-tests, and analysis of variance (ANOVA), are quite resilient to moderate
violations of normality, particularly when the sample sizes are sufficiently large.
Parametric tests such as the Z, t, and F tests focus on drawing inferences about the

Bapatla College of Pharmacy Page 202


Non-Parametric Tests

population mean(s), while non-parametric tests are typically used to make inferences
about the population median(s).

Non-parametric tests assume that the distribution of the data is continuous,


symmetric, and independent. In these tests, each observation in the dataset is ranked.
While the specific distribution of the data is not assumed, it must come from a
continuous distribution. Each distribution is represented by a density curve, allowing
observations to take any value within a given range.

The statistics used in non-parametric tests often focus on basic elements of the sample
data, such as the signs of measurements, order relationships, or category frequencies.
These tests are not affected by changes in the scale, whether the scale is stretched or
compressed, and as a result, the null distribution for a non-parametric statistic can be
determined without considering the shape of the underlying population distribution.
This makes non-parametric methods ideal when the distribution of the parent
population is uncertain. In fields like social and behavioural sciences, where
observations may be difficult or impossible to quantify on numerical scales, non-
parametric tests are particularly suitable.

These methods typically involve ranking the data, which reduces the precision of the
information (since we convert raw data into relative rankings). Most non-parametric
tests require the data to be ranked on an ordinal scale, where the smallest observation
is assigned a rank of 1, the second smallest a rank of 2, and so on until the largest
observation receives the rank n. By doing this, we may obscure true differences in the
data, making it harder to detect significant differences. In essence, non-parametric
tests require larger differences to be considered statistically significant.

Non-parametric tests generally assume equal variance within groups and can be
applied to nominal variables. These tests are particularly useful when dealing with
small sample sizes or when the assumptions of normality and homoscedasticity (equal
variances) cannot be met. Unlike parametric tests, which are not influenced by sample
size, non-parametric tests vary in their applicability based on sample size. Though
non-parametric tests are easier to compute, they are typically less powerful, meaning
there is a greater risk of committing a Type II error (failing to detect a true effect).

Bapatla College of Pharmacy Page 203


Non-Parametric Tests

Often called distribution-free tests, non-parametric methods do not assume any


specific population distribution. Their simplicity and efficiency make them
advantageous in certain contexts over traditional parametric tests.

Non-parametric tests are typically employed under the following conditions:

 When the data is measured on nominal or ordinal scales.


 When the data contains outliers that cannot be removed.
 When the test is intended to focus on the median rather than the mean.

10.2. Example of a Non-Parametric Test:

Consider an R&D department aiming to develop an orodispersible tablet for a bitter-


tasting drug. They use a 5-point ordinal scale to collect taste feedback from 20
participants, with scores ranging from 1 to 5: 1 for very bitter, 2 for bitter, 3 for
tasteless, 4 for slightly sweet, and 5 for very sweet. The data collected would be
analyzed using a non-parametric test to assess the distribution of responses and draw
conclusions about the overall taste preference.

Distributioin of Observed taste in total sample


9
8
7
6
5
4
3
2
1
0
Very Bitter Bitter Taste Less Slightly Sweet Very Sweet

1. The data does not seem to follow a normal distribution, as a larger number of
participants reported an improvement in taste.
2. A clinical study was conducted to assess the prevalence of a microbial
infection, where various cultures were collected for microbial detection. The
Non-Parametric Tests

outcome, a continuous variable, cannot be measured below or above a certain


threshold. In some cases, the culture reports may indicate the absence of
microbes, while others may show that microbes are undetectable. This kind of
data suggests that the observations do not follow a normal distribution.
3. Statistical analysis was conducted on the number of days various patients
spent in a cardiac hospital. Different treatments, such as bypass surgery, stent
therapy, and angioplasty, are commonly administered in these hospitals. The
number of days required for discharge varies for each patient, depending on
their health condition and the type of treatment received. This data may
potentially contain outliers.
4. Statistical analysis of data gathered from a behavioral study at a psychiatric
hospital, where patients’ behavior is quantified using scores and ranked
accordingly. These scores typically do not follow a normal distribution, as
most of the observed scores are likely to be skewed in one direction.

10.3. Ranking the Data

When dealing with an ordinal, interval, or continuous variable, the values are ranked
from lowest to highest. Non-parametric tests primarily focus on the ranks of the data
rather than the raw values. In cases where there are tied ranks (i.e., identical
responses), each tied observation is assigned the average rank. When performing non-
parametric tests, it is important to check that the sum of the ranks is equal to n(n+1)/2,
where n represents the number of observations. An example of how ranks are
assigned to data is illustrated in the following example.

Ranking Data
Data 22 28 26 25 27
Rank 1 5 3 2 4

In the case of ties, the average of the rank values is assigned to each tied observation
as follows.

Ranking Data

Data 25 28 30 24 26 25 26 28 29 26

Rank 2.5 7.5 10 1 5 2.5 5 7.5 9 5

Bapatla College of Pharmacy Page 205


Non-Parametric Tests

In this example there were two 25s (ranks 2, and 3) with an average rank of 2.5,
three26s (ranks 4, 5 and 6) with an average rank of 5 and two 28s (ranks 7, and 8)
with an average rank of 7.5

When comparing sets of data from different groups or different treatment levels
(levels of the independent variable), ranking involves all of the observations
regardless of the discrete level in which the observation occurs:

Group A (n=6) Group B (n=7) Group C (n=6) Total


(N=20)

Data Rank Data Rank Data Rank

16 5 20 14 19 11.5

18 9 19 11.5 24 19

17 6.5 15 3.5 23 18

12 1 17 6.5 20 14

14 2 18 9 18 9

15 3.5 22 17 20 14

21 16 25 20

The accuracy of the ranking process can be verified in two ways. First, the
highest rank assigned should match the total number of observations, denoted as N.
Second, the sum of all the ranks should be equal to N(N+1)/2, where N represents the
total number of observations.

10.4. General Procedure for Non-Parametric Tests

1. Formulation of Hypotheses: Begin by setting up the null and alternative


hypotheses, specifying the level of significance, and determining whether the
test will be one-tailed or two-tailed.
2. Selection of the Appropriate Statistical Test: Choose the correct non-
parametric test to effectively analyze the data.
3. Establishing a Decision Rule: Develop a decision rule to determine whether
to accept or reject the hypothesis.

Bapatla College of Pharmacy Page 206


Non-Parametric Tests

4. Calculation of the Test Statistic: Clearly outline the mathematical formulae


and the method to calculate the test statistic.
5. Conclusion: Conclude the statistical inference and clearly present the criteria
used to form the conclusion.

Subsequent Steps to Follow:

1. Observation of Nominal Variables: The first step in non-parametric tests is


to observe certain variables involved in the test, such as age, weight, height,
gender, color, taste, or oedema.
2. Generation of Ordinal Data: In cases of ordinal data, values are arranged on
a scale based on a predetermined or fixed threshold. These values are then
assigned ranks based on severity or magnitude.

Example: In an evaluation of rubber closures, extracts are injected into the


abdominal region of rabbits, and oedema at the injection site is observed. The
nominal data here is the presence or absence of oedema, while the ordinal data
involves categorizing oedema based on severity, such as no oedema, slight
oedema, moderate oedema, or severe oedema, and then scoring accordingly.

3. Ranking Based on Observed Values: Once the data is arranged, ranks are
assigned based on numerical values. For instance, an animal with higher
oedema (measured by diameter) will receive a higher rank, while an animal
with no oedema will receive the lowest rank. These ranks are used for further
analysis in non-parametric tests.

10.5. Advantages of Non-Parametric Tests:

1. Simpler calculations
2. Quick and easy to execute
3. Easy to interpret
4. More efficient than parametric tests when the data is not normally
distributed
5. Applicable to all scales of measurement
6. Fewer assumptions required

Bapatla College of Pharmacy Page 207


Non-Parametric Tests

7. No need to involve population parameters


8. Results can be as precise as parametric methods

10.6. Disadvantages of Non-Parametric Tests:

1. Less power compared to parametric tests. Power refers to the ability of the
test to detect differences when they exist.
2. Does not utilize the full information provided by the sample.
3. Interpretation can be more challenging.
4. Requires larger sample sizes to achieve comparable results with parametric
tests when both are applicable.
5. May result in wasted information if parametric tests could have been used.
6. Can be difficult to compute by hand for large sample sizes.
7. Statistical tables for non-parametric tests are not always readily available.

10.7. Various Non-Parametric Tests:

1. Sign Test
2. Wilcoxon Signed Rank Test
3. Mann-Whitney Test
4. Kruskal-Wallis Test
5. Friedman Test
6. Multiple Comparison Test
7. Quade Test
8. Rank Correlation
9. Run Test of Randomness
10. Median Test
11. Kolmogorov-Smirnov Test

Non-Parametric Alternatives to Parametric Tests:

Below is a list of non-parametric tests that serve as alternatives to commonly used


parametric tests.

Bapatla College of Pharmacy Page 208


Non-Parametric Tests

Parametric tests Nonparametric alternative


One sample Z-test, one sample t-test One sample sign test and one sample Wilcoxon
signed rank test
Independent sample t-test Mann-Whitney test
One way ANOVA Kruskal- Wallis test
Two way ANOVA Friedman test
Correlation coefficient Spearman rank correlation

Mann-Whitney U-Test:

The Mann-Whitney U-Test is the nonparametric equivalent of the two-sample t-test


assuming equal variances. This test is primarily used when the data fails to meet the
normality assumption or when there is significant uncertainty about the distribution of
the data. It is commonly applied to compare the outcomes between two independent
groups that are not normally distributed and when the sample size is small. The test is
also referred to as the Mann-Whitney Wilcoxon test or the Wilcoxon rank-sum test.
The goal of this test is to determine whether the two samples originate from the same
population. It is typically conducted as a two-tailed test. The procedure involves
combining the data from the two groups into a single dataset, ranking all observations
from the lowest (1) to the highest (n1+n2), and keeping track of which sample each
observation originally came from.

Assumptions: The following assumptions apply to this test:

 The samples are independent.


 The variable is continuous.
 The variances of the two groups are assumed to be equal.
 The distributions are identical but not necessarily normal.

Procedure:

1. Combine the observations from both groups and rank them from the smallest
to the largest, ignoring group labels.
2. Assign equal ranks to identical observations.
3. Reassign the observations back to their respective treatment groups.

Bapatla College of Pharmacy Page 209


Non-Parametric Tests

4. Replace the original observations with their corresponding ranks.


5. Sum the ranks for each group.
6. Calculate the test statistic, U, using the formula provided.

n1 ( n1  1)
U1  n1n2   R1
2

n2 ( n2  1)
U 2  n1n2   R2
2

Where n1 and n2 are the size of the group 1&2 respectively

R1 and R2 are the sum of the ranks corresponding to group 1 and 2

7. Determine critical value of U from table

8. Formulate decision and conclusion


Example:The number of episodes of migraine observed from two groups of migraine
patients each group comprising 6 patients over a period of one month after treatment
with newer anti migraine drug against a placebo is presented in the following table.
Test the significance of difference in migraine episodes by using appropriate
statistical test.

No. of migraine episodes observed from the 6


Treatment
patients

Placebo 7 8 9 10 12 8

New drug 1 3 2 2 4 1

Solution: In the given example, the outcome is a count, the sample size is small (n1 =
n2 = 6), and the data does not follow a normal distribution, as the number of migraine
episodes observed in the placebo group is significantly higher than that in the drug-
treated group. The data is collected from two independent groups (placebo vs. drug
treatment), making the Mann-Whitney U-Test the appropriate statistical method to
analyze the data.

Bapatla College of Pharmacy Page 210


Non-Parametric Tests

Null Hypothesis: The median number of migraine episodes is the same in both
groups. H0: The median number of migraine episodes in the placebo group is equal to
the median number of episodes in the drug-treated group.

Alternative Hypothesis: The median number of migraine episodes is not the same
between the two groups. H1: The median number of migraine episodes in the placebo
group is not equal to the median number in the drug-treated group.

Level of Significance: 0.05

The ranks should be assigned from the smallest to largest by combining the data from
both groups, as shown in the following example.

Number of Total sample ordered Assigned rank


migraine episodes smallest to largest
Placebo New drug Placebo New drug Placebo New drug
7 1 5 1 7 1.5
8 3 6 3 8.5 5
9 2 7 2 10 3.5
10 2 8 2 11 3.5
12 4 9 4 12 6
8 1 6 1 8.5 1.5
Total rank 57 21
sum
The sum of the rank should always equal to n(n+1)/2=12(12+1)/2=78 which is equal
to 57+21=78.

Calculation of U statistic:The test statistic for the Mann-Whitney Test is denoted with
U and the smaller value of U is considered.

6(6  1)
U1  6 X 6   57 =0
2

6(6  1)
U2  6X 6   21 =36
2

Bapatla College of Pharmacy Page 211


Non-Parametric Tests

Decision Rule: The critical value for U can be obtained from the critical value table
for U, given the sample sizes (n1 = 6 and n2 = 6) and a two-tailed test with a
significance level of 0.05. According to the table (10.1), the critical value is 5.

Conclusion: Since the observed U value is smaller than the critical value, the null
hypothesis is rejected. This indicates that the median number of migraine episodes
observed between the two groups is not the same.

Mann-Whitney U-Test for Large Sample Sizes:

For sample sizes equal to or greater than 20, significance is determined using the
normal distribution with the following formula.

| T  N1  N1  N 2  1 2 |
Z
N1 N 2  N1  N 2  1 12

Where:

 T represents the sum of ranks for the smaller sample.


 N1 is the size of the smaller sample.
 N2 is the size of the larger sample.

Example: The systolic blood pressure measurements from two groups, each
consisting of 20 subjects from rural and urban areas, are provided. The question is
whether it can be claimed that there is a significant difference in systolic blood
pressure between individuals from rural and urban areas.

Systolic Blood Pressure


Subject
(mm Hg)
Rural area Urban area
1 126 135
2 142 125
3 124 122
4 156 186
5 147 139
6 138 206
7 127 184

Bapatla College of Pharmacy Page 212


Non-Parametric Tests

8 152 152
9 149 142
10 138 153
11 136 145
12 118 135
13 147 125
14 137 110
15 139 132
16 144 138
17 137 109
18 162 120
19 128 162
20 109 158

Solution: The given data is representing two independent groups, sample size is equal
to 20 and the distribution is not known and hence the data can be treated statistically
by using Mann-Whitney U- Test for large size samples.

Null hypothesis:H0: The median systolic blood pressure observed from rural people is
similar to the median systolic blood pressure observed from the urban people.

Alternative hypothesis: H1: The median systolic blood pressure observed from rural
people is different from the median systolic blood pressure observed from the urban
people.

Level of significance: 0.05

Calculation of Z statistic: The z value for the given data is calculated as follows.

Systolic Blood Pressure Assigned rank


Subject
(mm Hg)
Rural area Urban area Rural area Urban area
1 126 135 10 14.5
2 142 125 24.5 8.5
3 124 122 7 6
4 156 186 34 39

Bapatla College of Pharmacy Page 213


Non-Parametric Tests

5 147 139 28.5 22.5


6 138 206 20 40
7 127 184 11 38
8 152 152 31.5 31.5
9 149 142 30 24.5
10 138 153 20 33
11 136 145 16 27
12 118 135 4 14.5
13 147 125 28.5 8.5
14 137 110 17.5 3
15 139 132 22.5 13
16 144 138 26 20
17 137 109 17.5 1.5
18 162 120 36.5 5
19 128 162 12 36.5
20 109 158 1.5 35
Rank sum 398.5 421.5

The lowest rank sum=T=398.5 NI=20; N2=20

| 398.5  2020  20  1 2 |
Z =0.311
20 x 2020  20  1 12

Conclusion: The calculated Z value is smaller than the critical Z value from the table,
meaning the null hypothesis is not rejected. Therefore, we conclude that the median
systolic blood pressure of individuals from rural areas is similar to the median systolic
blood pressure of individuals from urban areas.

Kruskal-Wallis Test (One-way ANOVA by Ranks)

The Kruskal-Wallis test is primarily used when there is one nominal variable and one
measurement variable, which would typically be analyzed using one-way ANOVA,
but the measurement variable does not meet the normality assumption required for
one-way ANOVA. As a non-parametric test, the Kruskal-Wallis test does not assume
that the data follows a distribution described by just two parameters—mean and
standard deviation. Instead, it operates on ranked data. To perform this test, the
observed values are converted into ranks, with the smallest value receiving a rank of

Bapatla College of Pharmacy Page 214


Non-Parametric Tests

1, the next smallest a rank of 2, and so on. This process of ranking the data results in a
loss of the original information, which may reduce the power of the test compared to
one-way ANOVA. One-way ANOVA assumes equal variation within groups
(homoscedasticity), whereas the Kruskal-Wallis test does not assume normality, but it
does require that the distributions across groups are similar. Groups with differing
standard deviations would have different distributions, which is why Kruskal-Wallis
is not suitable in such cases.

The Kruskal-Wallis test is ideal when the data involves one nominal variable and one
ranked variable. When this condition is met, a one-way ANOVA cannot be applied,
and the Kruskal-Wallis test is the appropriate alternative.

Requirements to Apply the Kruskal-Wallis Test:

 One independent variable with two or more levels (independent groups).


While the test can be used for two levels, it is more commonly used when
there are three or more levels. For two levels, the Mann-Whitney U test is
typically used.
 Dependent variables should be on an ordinal, ratio, or interval scale.
 Observations must be independent, meaning there should be no relationship
between members within each group or between the groups themselves.
 All groups should have the same distribution shape.

The Kruskal-Wallis (H) test is used to assess whether independent samples come from
the same population or not. In other words, it tests if the samples come from
populations with identical distributions. Although the Kruskal-Wallis test is
sometimes described as testing if the mean ranks of the groups are the same, it is more
accurate to state that it tests whether the medians of the groups are equal, provided the
shape of the distributions is identical across groups. Unlike one-way ANOVA, which
tests if samples are drawn from a Gaussian distribution, the Kruskal-Wallis test is
employed when samples are not normally distributed or when the data consists of
ranks. It is particularly useful for comparing more than two groups. The H value is
calculated using the following formula

Bapatla College of Pharmacy Page 215


Non-Parametric Tests

12  R 2t 
H= χ 2k –1 = Σ  – 3(N +1)
N(N +1)  n i 

Kruskal-Wallis Test Formula:

 N = Total number of observations across all groups


 Ri = Sum of the ranks for the iii-th group
 ni = Number of observations in the iii-th group
 k = Number of groups

The computed value is then compared with the critical value from the Kruskal-Wallis
test table (Table 10.8). If the calculated value is smaller than the table value, the null
hypothesis is accepted.

When each sample has at least five observations, the sampling distribution closely
approximates a chi-square distribution with k−1k-1k−1 degrees of freedom.

In one-way ANOVA, the goal is to test whether the population means are equal.
However, in the Kruskal-Wallis test, we examine whether the population medians
are equal. The method involves ranking all the observations across all groups,
followed by applying a one-way ANOVA to these ranks instead of the original data
values. This test is typically used when there are more than two treatments. While it is
preferable for the sample sizes to be equal, it is not mandatory. The Kruskal-Wallis
test is essentially an extension of the rank sum test for more than two treatments. If
the average of at least two treatments differs, significant differences are detected. The
procedure involves combining all the data and ranking each observation, then
summing the ranks for each group. This statistic is approximately distributed as a chi-
square distribution with k−1k-1k−1 degrees of freedom. The chi-square
approximation holds true when the sample size in each group is greater than five.
Among various non-parametric tests, the Kruskal-Wallis one-way ANOVA by ranks
is considered the best method for analyzing data from more than two groups.

Example:
Tablets of drug X were produced using three distinct techniques: wet granulation, dry
granulation, and direct compression. The disintegration times observed for tablets
made using these methods are presented below.

Bapatla College of Pharmacy Page 216


Non-Parametric Tests

Direct compression: 8, 7, 6, 7, 9, 10

Wet granulation: 11, 12, 13, 12, 13, 10

Dry granulation: 15, 14, 16, 15, 17, 18

Based on the results, can we infer that the median disintegration time for tablets
produced using three different techniques is the same?

Solution:

 Null hypothesis (H0): The median disintegration time for the samples is
identical across all groups.
 Alternative hypothesis (H1): The median disintegration time for the samples
differs across the groups.
 Level of significance (α): 0.05

Decision Rule:

The null hypothesis should be rejected if the calculated value of H is greater than
5.991, which is the critical value for a 0.05 significance level with 2 degrees of
freedom.

The ranks allotted to the tablets prepared with direct compression technique based
on observed disintegration time R1 = 4, 2.5, 1, 2.5,5, 6.5.

Sum = 4 + 2.5 + 1 + 2.5 + 5 + 6.5 = 21.5

The ranks allotted to the tablets prepared with wet granulation technique based on
observed disintegration time R2 = 8, 9.5, 11.5, 9.5, 11.5, 6.5.

Sum = 8 + 9.5 + 11.5 + 9.5 + 11.5 + 6.5 = 56.5

The ranks allotted to the tablets prepared with dry granulation technique based on
observed disintegration time R3 = 14.5, 13, 16, 14.5, 17, 18

Sum = 93

12  (21.5) 2 (56.5)2 (93) 2 


H=     – 3  19
18  19  6 6 6 

Bapatla College of Pharmacy Page 217


Non-Parametric Tests

12  462.25  3192.25  8649 


=  – 3 × 19
18  19  6 

12  12303.5 
=   – 3  19
18  19  6 

= 71.9502924–57

= 14.95

Decision

H = 14.95> 5.991

The null hypothesis is rejected.

Example

The blood glucose levels observed after three hours of post administration from three
groups, control and two different formulations of antidiabetic drug is given below.
Determine the significance of difference between formulations with Kruskal-Wallis
test at 0.05 level of significance.

Blood Glucose level


Subject
Formulation Formulation
No Control
1 2
1 130 90 86
2 136 92 88
3 132 93 87
4 134 91 88
5 133 90 90
6 135 89 89
Solution:

Null hypothesis: H0: The sample median blood glucose level is identical

Alternative hypothesis: H1: The sample median blood glucose level is not identical

Level of significance = 0.05

Bapatla College of Pharmacy Page 218


Non-Parametric Tests

Criterion

Reject the null hypothesis if H > 5.991, which is the value of 0.05 for 2 degrees of
freedom.

Calculation of test statistic:

Subject Formulation Formulation


Control Rank Rank Rank
No 1 2
1 130 13 90 8 86 1
2 136 18 92 11 88 3.5
3 132 14 93 12 87 2
4 134 16 91 10 88 3.5
5 133 15 90 8 90 8
6 135 17 89 5.5 89 5.5
93 = 54.5 = 23.5 =
R1 R2 R3
12  R 2t 
χ 2k –1 = Σ  – 3(N +1)
N(N +1)  n i 

12  R12 R 22 R 32 
=     – 3(19)
18.719  6 6 6 

12  (93) 2  (54.5) 2  (23.5)2 


=   – 3(19)
18.19  6 
1 12171.5
=  (93)2  (54.5) 2  (23.5)2   –(3.19)
9(19) 171

= 71.17836257 – 3(19)
= 14.17
For the Chi-square test with 2 degrees of freedom, the value must be at least
5.99 to be considered significant at the 5% level.
If statistically significant differences are detected, it is necessary to identify which
treatments are contributing to the differences. To perform pairwise comparisons, the
number of observations in each treatment group must be equal. The difference in the
sum of ranks between two different groups should be calculated. For example, you
would calculate the difference between the control group and one of the formulations,
or alternatively, between two formulations. If the observed differences in ranks

Bapatla College of Pharmacy Page 219


Non-Parametric Tests

exceed the critical value for the Chi-square distribution, then significant differences
are present between the control and the other two formulations. In some cases, the
difference between the control and formulation 1 may be significant, while in other
cases, the difference between the control and formulation 2 may be significant.
Another possibility is that the difference between the two formulations is significant.

Correction:
To enhance the Chi-square value, a correction can be applied. This adjustment
increases the degree of significance if the null hypothesis is rejected and may reveal
statistically significant differences that were not apparent in the original Chi-square
test.

x2
correction 

1   t 3i  t i  N 3
N 

ti = number of tied observed in group i

N = total number of observations

The correction for Chi-square to the above problem is

The above answer after correction results

14.178362 14.178362 14.178362


=  
6  6  24 1 – 0.00619195 0.993808049
1– ( )
5814

= 14.26

Exercise
The market value of each equity share of five pharmaceutical companies at various
months is as follows. Using Kruskal-Wallis test, check whether the average market
value of the shares at various months differs among the five pharmaceutical
companies.

Bapatla College of Pharmacy Page 220


Non-Parametric Tests

A B C D E

124 130 89 136 158

135 135 140 75 73

85 72 141 73 96

141 86 96 149 141

136 119 72 151 120

82 128 122 158 99

Friedman Test (Two-Way Analysis of Variance)

The Friedman test is a non-parametric method used to assess differences in treatments


across several attempts or conditions. Unlike parametric tests, it does not assume that
the data follows a specific distribution, such as the normal distribution. This makes
the Friedman test a suitable alternative to the ANOVA test when the data distribution
is unknown.

Essentially, the Friedman test is an extension of the sign test and is typically applied
when there are multiple treatments. If only two treatments are involved, the Friedman
test and the sign test yield identical results. This test is particularly useful when the
assumptions required for parametric analysis of variance are not met, or when the
scale of measurement is weak, such as in cases where the data is ordinal. It is also
used when the data can be arranged in a two-way ANOVA design, making it
appropriate for situations where ranking is possible and data is derived from more
than two groups.

For the Friedman test to be applicable, the following conditions must be met:

 The data should be ordinal (such as a Likert scale) or continuous.


 The data should come from a single group, measured on at least three different
occasions.
 The sample should be generated using a random sampling method.

Bapatla College of Pharmacy Page 221


Non-Parametric Tests

 Blocks (groups of observations) must be independent, meaning no influence


exists between the pairs.
 Observations within blocks should be ranked without any ties.

In this test, each treatment is ranked within its block (column), and the test statistic is
calculated using a specific formula.

2
FM= χ c–1 =
12
rc(c +1)
 
ΣR i2 – 3r(c +1)

The table values can be obtained from the Table 10.2

If the sample size is sufficiently large, then Chi-square distribution can be used to
approximate the test of significance – The Chi-square test is

2
χ c–1 =
12
rc(c +1)
 
ΣR i2 – 3r(c +1)

d.f = C–1

r = number of rows

c = number of columns

Ri = sum of the ranks in ith group

Ranks should be assigned within each block (such as subject, formulation, method,
etc.). In the case of tied ranks, the average rank should be given to the tied
observations.

For larger sample sizes, if the number of treatments (k) exceeds 5, or if the sample
size (n) is greater than 13, the chi-square critical value table should be used to assess
significance.

Example: The disintegration times from six different formulations, each containing
distinct disintegrants, are provided below. The objective is to determine whether the
variation in disintegration times due to the different disintegrants is statistically
significant.

Bapatla College of Pharmacy Page 222


Non-Parametric Tests

Disintegration time (minutes)


Disintegrant Disintegrant Disintegrant
S.No
A B C
1 8 12 15
2 7 11 16
3 6 10 17
4 7 11 15
5 5 10 18
6 6 9 17
Solution:

In the above problem, the lowest disintegration time observed from formulation 1 is
with disintegrant A is 8minuts so it is assigned 1st rank. The next highest value
observed is from the formulation containing disintegrant B is12 so it is assigned rank
2 andformulation containing disintegrant C is15 and assigned rank 3. The same
procedure is followed for other groups. The assigned ranks are shown in the following
table.

Disintegration time (minutes)


Tablet
Disintegr Ran Disintegr Disintegr
formulat Rank Rank
ant A k ant B ant C
ion
1 8 1 12 2 15 3
2 7 1 11 2 16 3
3 6 1 10 2 17 3
4 7 1 11 2 15 3
5 5 1 10 2 18 3
6 6 1 9 2 17 3
Ri 6 12 18
= [12/6×3(3+1)] (62+122+182) -3×6(3+1) =12

χ2 tabled value with level of Probability 0.05 and having d.f = C–1 = 2 is 5.99

There is significance difference between disintegrates

χ2 cal > χ2 table

Bapatla College of Pharmacy Page 223


Non-Parametric Tests

Exercise.

The following are the runs scored by 11 batsmen of a various cricket teams in three
seasons, test the null hypothesis that the batsmen constituting the population from
which the sample was drawn perform equally well in all the tree seasons against the
alternative hypothesis that they perform better in at least one season.

Batsman no A B C

1 124 130 136

2 135 135 141

3 122 141 141

4 141 120 149

5 136 119 151

6 158 128 158

7 155 121 161

8 161 119 165

9 166 116 169

10 172 114 173

11 177 112 178

Modified Friedman Test

To get approximate F distribution with C-1, (C-1) (r-1) degree of freedom and to get
better comparison compared to Chi-square distribution for the Friedman test, this
modification is applied. In this case, the statistic parameter T2 is calculated with the
following formulae.

r  1B2  rcc  12 / 4


T2 
A2  B2

A2= ∑xi2wherexi are the individual ranks. If there are no ties, thenA2 is calculated with
the following equation.

A2 = Cr (C + 1) (2C + 1)/6; B2 = 𝛴(𝑐 )

Bapatla College of Pharmacy Page 224


Non-Parametric Tests

C= number of columns and r= number of rows.

Ci = sum of observations in column i

The above problem is on subjecting to modified Friedman test results

A2 = Cr(C + 1) (2C + 1)/6 = 3 × 6 (3 + 1) (2×3 + 1)/6

= 3 × 4 × (7) = 84

1 1 1
B2 =  (c i )2  (6) 2  (12) 2  (18)2   [504]  84
r 6 6

 (3  1)2 
(6 – 1) 84 – 3.6
 4 
T2 = =∞
84 – 84

10.4.3. Multiple Comparisons for the Modified Friedman Test


In previous case, as the differences in disintegrants was found to be significant then
the differences between which two superdisintegrants is significant is detected by the
following formulae.

C j 
 ci  t 2r  A2  B2  r  1c  1
t = tabled t value with (r – 1) (c – 1) d.f at the specified  level (usually 0.05)

For the above problem, the table t-value for the degree of freedom (6-1) (3-1) =10 at
5% level of significance for two tailed test is 2.23.

[CJ-CI]>2.23√2×5(84-84)/ (6-1) (3-1) =0

Any difference in rank sum greater than or equal to 0 is significant at the 0.05 level.
For the above problem, the rank sums observed from disintegrants A&B, A&C and
B&C are significantly differ and indicating that the differences in disintegration time
observed from the tablets containing three disintegrants is significant.

Dunn's Multiple Comparison Test

Both the Kruskal-Wallis and Friedman tests are non-parametric methods used to
compare three or more independent groups (non-Gaussian distribution). When
analyzing such data, a post-test is often required for more precise comparisons
between the groups. Dunn’s test is one such post-test that performs pairwise
comparisons across groups. It evaluates the difference in the sum of ranks between

Bapatla College of Pharmacy Page 225


Non-Parametric Tests

two columns against the expected average difference, and computes a 't' value for
each pair of columns. These 't' values help determine whether the differences are
statistically significant.

In Dunn’s test, the mean rank differences between the groups are compared pair by
pair. For each comparison, a standard deviation is calculated. The absolute difference
in ranks for each pair is then divided by its standard deviation to produce a 't' statistic.
This statistic is compared against the standard normal z distribution. If the resulting
statistic at α/2 (0.05) is sufficiently large, the null hypothesis, which posits no
difference between the ranks, is rejected. Multiple comparisons are conducted only if
a significant overall result is found in the main test at the usual α (0.05) level. This
approach helps mitigate errors that might arise from conducting multiple comparisons
in non-parametric tests.

Dij Rt  R j Ri  R j
Tij   
 ij  ij N ( N  1)  1 1 
  
12  ni n j 

Dij = Mean rank difference

 ij = Standard deviation of differences

Rt  R j = the absolute value of difference in mean ranks between group i


and group j

N = Total sample size

ni = sample size for group i

ni = sample size for group j

Example: Tablets of drug x were prepared by three different techniques such as wet
granulation, dry granulation and direct compression techniques. The disintegration
time observed from the tablets prepared by these methods is given below.

Direct compression: 8, 7, 6, 7, 9, 10

Wet granulation: 11, 12, 13, 12, 13, 10

Bapatla College of Pharmacy Page 226


Non-Parametric Tests

Dry granulation: 15, 14, 16, 15, 17, 18

Identifying Statistical Significance in Disintegration Times

Based on the previous analysis, we can determine whether there are significant
differences in the disintegration times of tablets produced using different techniques.
If significant differences are found, it is important to identify which specific technique
is primarily responsible for these variations.

Solution:
This problem has been addressed previously, and the results demonstrated that the
disintegration times of tablets prepared using different techniques show statistically
significant differences (as determined using the Kruskal-Wallis one-way ANOVA).
The observed differences can be attributed to the following factors:

1. The disintegration times of tablets made with direct compression and wet
granulation methods are significantly different.
2. The disintegration times of tablets made with direct compression and dry
granulation methods are significantly different.
3. The disintegration times of tablets made with wet granulation and dry
granulation methods are significantly different.

To pinpoint the specific reasons behind these differences, the data should be further
analyzed using Dunn’s multiple comparison test.

This test will be applied to the first and second observations to identify the
contributing factors.

The ranks allotted to the tablets prepared with direct compression technique based on
observed disintegration time R1 = 4, 2.5, 1, 2.5,5, 6.5.

Sum = 4 + 2.5 + 1 + 2.5 + 5 + 6.5 = 21.5

Mean rank=21.5/6=3.58

The ranks allotted to the tablets prepared with wet granulation technique based on
observed disintegration time R2 = 8, 9.5, 11.5, 9.5, 11.5, 6.5.

Bapatla College of Pharmacy Page 227


Non-Parametric Tests

Sum = 8 + 9.5 + 11.5 + 9.5 + 11.5 + 6.5 = 56.5

Mean rank=56.5/6=9.42

Dij Rt  R j R1  R 2
Tij   
 ij  ij N ( N  1)  1 1 
12 n  n 
 1 2

Dij Rt  R j 3.58  9.42


T12     2.877
 ij  ij 12(12  1)  1 1 
12  6  6 

The calculated value is greater than table Z value and hence null hypothesis is rejected
and concluded that disintegration time observed from the tablets prepared with direct
compression and wet granulation techniques is statistically significant

Applying Dunn’s test for the first and third observations

The ranks allotted to the tablets prepared with direct compression technique based on
observed disintegration time R1 = 4, 2.5, 1, 2.5,5, 6.5.

Sum = 4 + 2.5 + 1 + 2.5 + 5 + 6.5 = 21.5

Mean rank=21.5/6=3.58

The ranks allotted to the tablets prepared with dry granulation technique based on
observed disintegration time R3 = 14.5, 13, 16, 14.5, 17, 18

Sum = 93

Mean rank=93/6=15.5

Dij Rt  R j 3.58  15.5


T13     5.73
 ij  ij 12(12  1)  1 1 
12  6  6 

The calculated value is greater than table Z value and hence null hypothesis is rejected
and concluded that disintegration time observed from the tabletsprepared with direct
compression and dry granulation techniques is statistically significant

Bapatla College of Pharmacy Page 228


Non-Parametric Tests

Applying Dunn’s test for the secondand third observations

The ranks allotted to the tablets prepared with wet granulation technique based on
observed disintegration time R2 = 8, 9.5, 11.5, 9.5, 11.5, 6.5.

Sum = 8 + 9.5 + 11.5 + 9.5 + 11.5 + 6.5 = 56.5

Mean rank=56.5/6=9.42

The ranks allotted to the tablets prepared with dry granulation technique based on observed
disintegration time R3 = 14.5, 13, 16, 14.5, 17, 18

Sum = 93

Mean rank=93/6=15.5

Dij Rt  R j 9.42  15.5


T23     2.92
 ij  ij 12(12  1)  1 1 
12  6  6 

The calculated value is greater than table Z value and hence null hypothesis is rejected and
concluded that disintegration time observed from the tablets prepared with wet granulation
and dry granulation techniques is statistically significant.

Finally it is concluded that all three techniques are different in their performance and
each pair wise comparison is also statistically significant.
Table 10.1: Critical values for Mann-Whitney U- Test:

Alpha = .05 (two-tailed)

Bapatla College of Pharmacy Page 229


Non-Parametric Tests

Table 10.1
Non-Parametric Tests

Table 10.2
Non-Parametric Tests

References

1.Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept
and the practical use. Korean journal of anesthesiology. 2016;69(1):8-14.

2.De Muth JE. Basic statistics and pharmaceutical statistical applications. CRC Press;
2014.

3.Goutelle S, Woillard JB, Neely M, Yamada W, Bourguignon L. Nonparametric


methods in population pharmacokinetics. The Journal of Clinical Pharmacology.
2022;62(2):142-57.

4. Goutelle S, Woillard JB, Neely M, Yamada W, Bourguignon L. Nonparametric


methods in population pharmacokinetics. The Journal of Clinical Pharmacology. 2022
Feb;62(2):142-57.

5.Chan Y, Walmsley RP. Learning and understanding the Kruskal-Wallis one-way


analysis-of-variance-by-ranks test for differences among three or more independent
groups. Physical therapy. 1997;77(12):1755-61.

6. Theodorsson-Norheim E. Kruskal-Wallis test: BASIC computer program to


perform nonparametric one-way analysis of variance and multiple comparisons on
ranks of several independent samples. Computer methods and programs in
biomedicine. 1986;23(1):57-62.

7. Johnson RW. Alternate forms of the one-way ANOVA F and Kruskal–Wallis test
statistics. Journal of Statistics and Data Science Education. 2022 Jan 2;30(1):82-5.

8.Ostertagova E, Ostertag O, Kováč J. Methodology and application of the Kruskal-


Wallis test. Applied mechanics and materials. 2014;611:115-20.

9.Theodorsson-Norheim E. Friedman and Quade tests: BASIC computer program to


perform nonparametric two-way analysis of variance and multiple comparisons on
ranks of several related samples. Computers in biology and medicine. 1987;17(2):85-
99.

10. VincyTam WC, Burley JB, Rowe DB, Machemer T. Comparison of five green
roof treatments in Flint Michigan with Friedman's two-way analysis of variance by
ranks. Journal of Architecture and Construction. 2020;3(1):23-36.

11.Dunn OJ. Multiple comparisons among means. Journal of the American statistical
association. 1961;56(293):52-64.

Bapatla College of Pharmacy Page 232


About Author(s)

Prof. (Dr.) T.E. Gopala Krishna Murthy is serving as a Professor and Principal at Bapatla
College of Pharmacy, Bapatla, Andhra Pradesh, India. He has 30 years of experience in
teaching, research, and administration. He completed his B.Pharm from Gulbarga University,
M.Pharm from Birla Institute of Technology, and Ph.D. from J.N.T. University. He has
published 271 research and review papers in international and national journals, of which 116
papers are published in Scopus-indexed journals. He serves as a reviewer and editorial board
member for reputed journals. He has delivered numerous guest lectures at various national-
level events. Prof. Murthy has published 6 books in pharmacy with reputed publishers. He
holds 9 granted patents and 6 published patents. He has received 2 research grants from
AICTE. He has guided 25 Ph.D. and 105 M.Pharm students. He has received the Meritorious
Teacher Award from JNTUK three times, the Best Principal Award from JNTUK, the Best
Researcher Award from JNTUK, and the Best Teacher Award from APTI AP State Branch. He
has also received fellowships from the Association of Biotechnology & Pharmacy and the AP
Academy of Sciences and recipient of Sir C. V. Raman award from Science city of Andhra
Pradesh. He is currently acting as a CEC Member of the Indian Pharmaceutical Association,
Mumbai, and as the Regional Coordinator for the AP Academy of Sciences, Guntur Region.
He also visited Tashkent institute of pharmaceutical sciences Uzbekistan to deliver guest
lectures. He is serving as a member of the Board of Studies for Pharmacy Courses at JNTUK.
He is a life member of professional bodies such as IPA and APTI.

Dr.Rajyalakshmi Kadiyam working as Associate professor in Bapatla College of Pharmacy had an


experience of 19 years of teaching since 2006. She completed her PG in Pharmaceutics from Sri
Padmavathi Mahila University, Tirupathi. She received doctorate from JNTU Hyderabad. She Guided
20 M.Pharm students and 70 B.Pharm students. Presented 15 research papers in various National and
International Journals.

Mrs. Ch. Sushma is currently serving as an Assistant Professor at Bapatla College of Pharmacy with
17 years of teaching experience. She holds a Postgraduate degree in Pharmaceutical Analysis and
Quality Assuarance from Jwaharlal Technological Univeristy, Kakinada. Mrs. Sushma has guided 5
M.Pharm and 10 B.Pharm students.

Mr. B. Sudheer Chowdary working as Associate Professor in Bapatla College of Pharmacy. He


completed his M. Pharmacy in Pharmacology from VELS University. With a strong research
background, he has published 18 national and international papers and contributed a book chapter to
Pharmaceutical Dosage Form Technology. He is an active member of the Indian Pharmacological
Society and holds life memberships in both the APTI and the Society for Ethnopharmacology.
Additionally, he has expertise in using statistical software and research tools such as docking, enhancing
the quality of his research.

You might also like