Introduction to Biostatistics & Basic Concepts
Gökmen ZARARSIZ, Phd.
Dinçer GÖKSÜLÜK, Phd.
Erciyes University, Faculty of Medicine, Department of Biostatistics
[email protected] [email protected] March 01, 2021
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 1 / 46
Copyright 2019 ©. All Rights Reserved. May not be copied, scanned, or
duplicated, in whole or in part.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 2 / 46
Table of Contents
1 Statistics
2 Biostatistics
Statistics vs. Biostatistics
Biostatistics: Research planning to making decisions
Why Biostatistics?
Research Topics
3 Basic Concepts
Population and Sample
Parameter and Statistic
Sampling and Estimation
Accuracy, Precision and Bias
Observation and Variable
Data, Types of Data & Software
4 Sources
Books
Journals
5 References
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 3 / 46
Statistics
Foundation of Statistics
The foundations of modern statistics were laid in the 17th century.
First source of statistics: political science (or political arithmetic)
Second source of statistics: the probability theory
The correspondence between Blaise Pascal (1623-1662) and Pierre de
Fermat (1601-1665)
Ars Conjectandi written by Jacques Bernoulli (1654-1705)
Combination of the daily with probability theory: Abraham de Moivre
(1667-1754)
Famous astronomers and mathematicians: Pierre Simon Laplace
(1749-1827) and Karl Friedrich Gauss (1777-1855)
Combination of the theory and practical methods of statistics by
Adolphe Quetelet (1796-1874)
The father of biostatistics and eugenics: Francis Galton (1822-1911)
Application of statistical methods to biology: Karl Pearson (1857-1936)
A genius who almost single-handedly created the foundations for
modern statistical science: Ronald A. Fisher (1890-1962)
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 4 / 46
Statistics
Definition of Statistics
Fisher, 1950
”... may be regarded as mathematics applied to observational data. ...
may be regarded (i) as the study of populations, (ii) as the study of
variation, (iii) as the study of methods of the reduction of data.”
Mood, 1950
”the technology of the scientific method.”
von Mises, 1957
”to make inference on the probability of events from their observed
frequencies.”
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 5 / 46
Statistics
Definition of Statistics
Kendall & Stuart, 1963
”the branch of the scientific method which deals with the data obtained by
counting or measuring the properties of populations of natural
phenomena.”
Mainland, 1963
”the science and art of dealing with variation in such a way as to obtain
reliable results.”
Savage, 1968
”uncertainty and behavior.”
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 6 / 46
Statistics
Definition of Statistics
Kruskal, 1968
”is concerned with the inferential process, in particular with the planning
and analysis of experiments or surveys, with the nature of observational
errors and sources of variability that obscure underlying patterns, and with
the efficient summarizing of sets of data.”
Sokal & Rohlf, 1969
”the scientific study of numerical data based on natural phenomena”
Wayne, 2005
”a field of study concerned with (1) the collection, organization,
summarization, and analysis of data; and (2) the drawing of inferences
about a body of data when only a part of the data is observed”
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 7 / 46
Statistics
Definition of Statistics
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 8 / 46
Statistics
Mathematical and Applied Statistics
Mathematical statistics: Application of mathematics to statistics using
probability theory, linear algebra, differential equations, etc.
Applied statistics: Application of mathematical statistics to specified
areas including biology, economics, engineering, psychology, etc.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 9 / 46
Statistics
Descriptive and Inferential Statistics
Descriptive statistics (exploratory data analysis): Organizing,
summarizing, and displaying data
Inferential statistics (confirmatory data analysis): Using sample data to
draw conclusions about a population
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 10 / 46
Biostatistics
Statistics vs. Biostatistics
Biostatistics is the study of statistics as applied to biological areas
Similarly,
Scientific branch Applied area
Sociometrics Social sciences
Psychometrics Psychological sciences
Econometrics Economics
Technometrics Physical, chemical, and engineering sciences
Anthropometrics The measurement of the human individuals
Bibliometrics Written publications, such as books or articles
Scientometrics Quantitative features and characteristics of science
Informetrics Information sciences
Cliometrics Historical sciences
Table: The fields of applied statistics
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 11 / 46
Biostatistics
Biostatistics: Research planning to making decisions
Biostatistics deals with the development and application of the most
appropriate methods for the:
Research planning including design of experiments, clinical trials,
survey
Formulation of statistical hypotheses and determination of appropriate
methodology including sampling and sample size calculation
Data analysis
Presentation,interpretation and reporting of the results
Making decisions on the basis of such analysis
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 12 / 46
Biostatistics
Why Biostatistics?
Parents, with genetic anomalies in their children, suspects that they
will have anomalies in their new children and want to decide whether
or not to have children.
A manufacturer developed an in vitro diagnostic procedure to replace
the microplate procedure and wants to know whether there is a
systematic measurement error between the procedures.
An oncology physician wants to choose the best therapy (e.g.
chemotherapy, radiotherapy, etc.) to a breast cancer patient.
A transplantation company is trying to determine the mean survival
time after bone marrow transplantation in leukemia patients.
A pharmaceutical company is trying to identify the candidate
metabolomics biomarkers of lung cancer.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 13 / 46
Biostatistics
Research topics in Biostatistics
Clinical trials
Diagnostic tests and ROC analysis
Multivariate analysis
Survival analysis
Machine-learning
Neural-networks and deep learning
Bioinformatics
Multiple testing and multiple comparisons
Statistical modeling of high-dimensional data
Biomarker discovery
Personalized medicine
Statistical programming
...
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 14 / 46
Bioinformatics
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 15 / 46
Population and Sample
Population: A collection of people or objects that share common
observable characteristics
Sample: A random subset of population
Sampling: The process of selecting samples from the population
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 16 / 46
Parameter and Statistic
Parameter: The measures describing the variables of populations
Statistic: Corresponding estimate from a sample
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 17 / 46
Parameter and Statistic
Measure Population parameter Sample statistic
Number of observations N n
Mean µ x
Median η M
Proportion p p̂
Standard deviation σ s
Variance σ2 s2
Skewness ν g1
Kurtosis τ g2
Correlation coefficient ρ r
Regression coefficient β b
Table: Commonly used parameter and statistics
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 18 / 46
Why Sampling?
Reduced cost
Reduced time
Practical in most situation
Sometimes it is impossible to study the whole population (e.g. marine
biology)
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 19 / 46
Estimation and Bias
Estimation: Using the sample statistic in place of the population
parameter
Bias: A systematic deviance between the population parameter and the
sample statistic
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 20 / 46
Accuracy and Precision
Both accuracy and precision reflect how close a measurement is to an
actual value
Accuracy: Closeness of a measurement to its true value
Precision: Closeness of repeated measurements
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 21 / 46
Bias and Imprecision
Bias: Calculated from the distance between two measurements
Imprecision: Calculated from the variation of the data
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 22 / 46
Sources of Bias
Selection bias
Patients are selected according to the researchers’ own arbitrary criteria.
Evaluation bias
The treatment of the patient is known by the physician. The physician is
unable to evaluate the effectiveness or reliability of the treatment he / she
has tested.
Publication bias
The researchers/scientific editors prefer more frequently to publish the
studies in which popular findings are obtained.
Recall bias
It is caused by the accuracy or deficiencies of the individuals participants
of the research to remember past events or experiences.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 23 / 46
Observation
Definition
Observation: The value of something of interest which is measured or
counted during a study (or, a case of the data being collected)
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 24 / 46
Variable
Definition
Variable: Observed or measured characteristics, which takes different
values for different observations (or, characteristic of the observation
recorded in the data).
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 25 / 46
Random Variable
Discrete and Continuous Random Variables
A variable whose value is determined by chance (cannot be exactly
predicted in advance).
Random variables are denoted by uppercase letters (X , Y ). Observed
numerical values of random variables are denoted by lowercase letters
(x, y ).
Discrete random variables: Can take on a countable number of distinct
values such as X ={x ∈ Z>0 | 1,2,3,. . . ,100}
e.g. # of visits to a doctor in a year, leukocyte count
Continuous random variables: Can take on any value in some intervals
of real numbers Y ={y ∈ R}
e.g. diastolic blood pressure (mmHg) , body mass index (kg/m2 )
Discrete variables are counted, while continuous variables are measured.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 26 / 46
Types of Variables
Independent, Dependent and Controlled Variables
Dependent (outcome, response) variable (Y ): The variable being
tested and measured in a scientific experiment
Independent (predictor, explanatory) variable (X ): The variable that
is changed or controlled in a scientific experiment to test the effects on the
dependent variable
Controlled variable: The variable which remains constant
Y = f (X ) e.g. Y = β0 + β1 X
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 27 / 46
Types of Variables
Independent, Dependent and Controlled Variables
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 28 / 46
Data
Data, Data Analysis, Database
Data: The raw material of statistics. Numbers obtained from any record,
descriptive accounts, or symbolic representation of an attribute, event, or
process (singular form: datum)
Data analysis: The process of compiling and analysing data to make
inference and support decision making
Database: An organized collection of data, generally stored and accessed
electronically from a computer system
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 29 / 46
Data
Data, Information, Knowledge and Wisdom
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 30 / 46
Data
Data
Data represents a fact or statement of event without relation to other
things.
Information
Information embodies the understanding of a relationship of some sort,
possibly cause and effect.
Knowledge
Knowledge represents a pattern that connects and generally provides a
high level of predictability as to what is described or what will happen next.
Wisdom
Wisdom embodies more of an understanding of fundamental principles
embodied within the knowledge that are essentially the basis for the
knowledge being what it is. Wisdom is essentially systemic.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 31 / 46
Data
Data
It is raining.
Information
The temperature dropped 15 degrees and then it started raining.
Knowledge
If the humidity is very high and the temperature drops substantially the
atmospheres is often unlikely to be able to hold the moisture so it rains.
Wisdom
It rains because it rains. And this encompasses an understanding of all the
interactions that happen between raining, evaporation, air currents,
temperature gradients, changes, and raining.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 32 / 46
Data
Source of Biomedical Data
Electronic health records
Clinical trials
Descriptive surveys
Medical research
Patient-generated health data
Laboratory results
Examination
Inpatient health monitoring
Imaging data
Genetics data
Experimental data
Text data
...
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 33 / 46
Data
Types of Data
The methods for describing and analyzing data depend upon the type of
data: qualitative and quantitative.
Qualitative data: Individuals are placed into categories, according to a
quality, that do not have numerical values e.g. gender (male/female)
Quantitative data: Numerical data that have a natural order and can be
continuous or discrete e.g. hemoglobin level (g/dL)
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 34 / 46
Data
Scale of Data
Nominal Qualitative
The numbers are simply indicators of a category.
e.g. Smoking status (0: No, 1: Yes)
Ordinal Qualitative
The numbers represent an ordering or ranking of the observation
e.g. Obesity (1: Underweight, 2: Normal, 3: Overweight, 4: Obese)
Interval Quantitative
Measured on a scale continuously in equal units (no true zero value)
Ratio Quantitative
Measured on a scale continuously in equal units (has true zero value)
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 35 / 46
Data
Scale of Data
Nominal Qualitative
e.g. Gender (1: Male, 2: Female), Blood type (1: 0, 2: A, 3: B, 4: AB),
Leukemia subtypes (1: ALL, 2: AML, 3: CML, 4: Other), etc.
Ordinal Qualitative
e.g. Education level (1: <High school, 2: High school, 3: Bachelors, 4:
Masters, 5: Doctorate), Grade of breast cancer (1: Grade-1, 2: Grade-2,
3: Grade-3), etc.
Interval Quantitative
e.g. Temperature °C, Level of happiness (1-10), etc.
Ratio Quantitative
e.g. Height (cm), AST (U/L), Gene expression, etc.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 36 / 46
Data
Data Matrix
nxp dimensional table, which includes the data of p variables belonging to
n observations
x11 x12 x13 . . . x1p
x21 x22 x23 . . . x2p
A=
x31 x32 x33 . . . x3p
.. .. .. . . ..
. . . . .
xn1 xn2 xn3 . . . xnp
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 37 / 46
Data
Data Analysis Software
The computer environment to describe, analyze and visualize the data.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 38 / 46
Why R?
R is free
R is open source
R is a programming language
R includes advanced graphical
libraries
R is a flexible statistical analysis
tool
R has a large and active
community
R has unlimited capabilities
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 39 / 46
Why TURCOSA?
TURCOSA is user-friendly
Usage with minimal statistical knowledge
TURCOSA works on cloud
Data analysis on PC, tablet and smartphone
TURCOSA is a project-based tool
Multiple data, multiple users at same project
TURCOSA provides interactive reporting
Interactive tables and graphs
TURCOSA supports multiple languages
English and Turkish languages currently
Subscription payment model
Monthly and yearly payment
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 40 / 46
Biostatistics Sources
Books
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 41 / 46
Biostatistics Sources
Journals
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 42 / 46
Biostatistics Sources
Communities and Events
International Biometric Society
http://www.biometricsociety.org
10th Conference of the EMR IBS | https://www.emr2018.com
International Society for Clinical Biostatistics
http://www.iscb.info
ISCB ASC 2018 | https://iscbasc2018.com
Turkish Association of Biostatistics
http://biyoistatistikdernegi.org.tr
XX. National and III. International Biostatistics Congress |
http://www.biyoistatistikkongresi.org
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 43 / 46
References I
Alpar, R. [2016]. Uygulamalı İstatistik ve Geçerlik Güvenirlik, 4th ed. Detay
Yayincilik, Ankara
Alpar, R. [2017]. Uygulamalı Çok Değişkenli İstatistiksel Yöntemler, 5th ed. Detay
Yayincilik, Ankara
Chernick, M. R., and Friis, R. H. [2003]. Introductory Biostatistics for the Health
Sciences: Modern Applications Including Bootstrap, 1st ed. Wiley Interscience,
New Jersey
Crawley, M.J. [2004]. The R Book, 1st ed. Wiley, England
Elston, R. C., and Johnson, W. D. [2008]. Basic Biostatistics for Geneticists and
Epidemiologists: A Practical Approach, 1st ed. Wiley, UK
Fisher, R. A. [1950]. Statistical Methods for Research Workers, 11th ed. Hafner,
New York.
Fisher, L. D., van Belle, G., Heagerty, P. J., Lumley, T. [2004]. Biostatistics: A
Methodology for the Health Sciences, 2nd ed. Wiley Interscience, New Jersey
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 44 / 46
References II
Forthofer, R. N., Lee, E. S., Hernandez, M. [2007]. Biostatistics: A Guide to
Design, Analysis, and Discovery, 2nd ed. Elsevier, London
Kendall, M. G., and Stuart, A. [1963]. The Advanced Theory of Statistics, Vol. 1,
2nd ed. Charles Griffin, London.
Kruskal, W. [1968]. In International Encyclopedia of the Social Sciences, D. L. Sills
(ed). Macmillan, New York.
Logan, M. [2010]. Biostatistical Design and Analysis Using R: A Practical Guide,
1st ed. Wiley Blackwell, UK
Mainland, D. [1963]. Elementary Medical Statistics, 2nd ed. Saunders, Philadelphia.
Mood, A. M. [1950]. Introduction to the Theory of Statistics. McGraw-Hill, New
York.
Rosner, B. [2011]. Fundamentals of Biostatistics, 7th ed. Brooks/Cole, Cengage
Learning, Boston
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 45 / 46
References III
Savage, I. R. [1968]. Statistics: Uncertainty and Behavior. Houghton Mifflin,
Boston.
Sokal, R. R., and Rohlf, F. J. [1969]. Introduction to Biostatistics. W. H. Freeman
and Co., Ltd., New York.
von Mises, R. [1957]. Probability, Statistics and Truth, 2nd ed. Macmillan, New
York.
Wassertheil Smoller S. [2004]. Biostatistics and Epidemiology: A Primer for Health
and Biomedical Professionals, 3rd ed. Springer-Verlag, New York
Wayne W. D. [2005]. Biostatistics: A Foundation for Analysis in the Health
Sciences, Ninth Edition. Wiley, New York.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 46 / 46