0% found this document useful (0 votes)
89 views46 pages

01 Introduction To Biostatistics

The document provides an introduction to biostatistics, covering fundamental concepts such as the definition of statistics, the distinction between biostatistics and other applied statistics fields, and essential topics like population, sample, and data types. It emphasizes the importance of biostatistics in research planning, data analysis, and decision-making in biological contexts. Additionally, it outlines various research topics within biostatistics, including clinical trials, survival analysis, and bioinformatics.

Uploaded by

imamsyed650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views46 pages

01 Introduction To Biostatistics

The document provides an introduction to biostatistics, covering fundamental concepts such as the definition of statistics, the distinction between biostatistics and other applied statistics fields, and essential topics like population, sample, and data types. It emphasizes the importance of biostatistics in research planning, data analysis, and decision-making in biological contexts. Additionally, it outlines various research topics within biostatistics, including clinical trials, survival analysis, and bioinformatics.

Uploaded by

imamsyed650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Introduction to Biostatistics & Basic Concepts

Gökmen ZARARSIZ, Phd.


Dinçer GÖKSÜLÜK, Phd.

Erciyes University, Faculty of Medicine, Department of Biostatistics


[email protected]
[email protected]

March 01, 2021

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 1 / 46


Copyright 2019 ©. All Rights Reserved. May not be copied, scanned, or
duplicated, in whole or in part.

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 2 / 46


Table of Contents
1 Statistics
2 Biostatistics
Statistics vs. Biostatistics
Biostatistics: Research planning to making decisions
Why Biostatistics?
Research Topics
3 Basic Concepts
Population and Sample
Parameter and Statistic
Sampling and Estimation
Accuracy, Precision and Bias
Observation and Variable
Data, Types of Data & Software
4 Sources
Books
Journals
5 References
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 3 / 46
Statistics
Foundation of Statistics

The foundations of modern statistics were laid in the 17th century.


First source of statistics: political science (or political arithmetic)
Second source of statistics: the probability theory
The correspondence between Blaise Pascal (1623-1662) and Pierre de
Fermat (1601-1665)
Ars Conjectandi written by Jacques Bernoulli (1654-1705)
Combination of the daily with probability theory: Abraham de Moivre
(1667-1754)
Famous astronomers and mathematicians: Pierre Simon Laplace
(1749-1827) and Karl Friedrich Gauss (1777-1855)
Combination of the theory and practical methods of statistics by
Adolphe Quetelet (1796-1874)
The father of biostatistics and eugenics: Francis Galton (1822-1911)
Application of statistical methods to biology: Karl Pearson (1857-1936)
A genius who almost single-handedly created the foundations for
modern statistical science: Ronald A. Fisher (1890-1962)
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 4 / 46
Statistics
Definition of Statistics

Fisher, 1950
”... may be regarded as mathematics applied to observational data. ...
may be regarded (i) as the study of populations, (ii) as the study of
variation, (iii) as the study of methods of the reduction of data.”

Mood, 1950
”the technology of the scientific method.”

von Mises, 1957


”to make inference on the probability of events from their observed
frequencies.”

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 5 / 46


Statistics
Definition of Statistics

Kendall & Stuart, 1963


”the branch of the scientific method which deals with the data obtained by
counting or measuring the properties of populations of natural
phenomena.”

Mainland, 1963
”the science and art of dealing with variation in such a way as to obtain
reliable results.”

Savage, 1968
”uncertainty and behavior.”

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 6 / 46


Statistics
Definition of Statistics

Kruskal, 1968
”is concerned with the inferential process, in particular with the planning
and analysis of experiments or surveys, with the nature of observational
errors and sources of variability that obscure underlying patterns, and with
the efficient summarizing of sets of data.”

Sokal & Rohlf, 1969


”the scientific study of numerical data based on natural phenomena”

Wayne, 2005
”a field of study concerned with (1) the collection, organization,
summarization, and analysis of data; and (2) the drawing of inferences
about a body of data when only a part of the data is observed”

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 7 / 46


Statistics
Definition of Statistics

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 8 / 46


Statistics
Mathematical and Applied Statistics

Mathematical statistics: Application of mathematics to statistics using


probability theory, linear algebra, differential equations, etc.

Applied statistics: Application of mathematical statistics to specified


areas including biology, economics, engineering, psychology, etc.

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 9 / 46


Statistics
Descriptive and Inferential Statistics

Descriptive statistics (exploratory data analysis): Organizing,


summarizing, and displaying data

Inferential statistics (confirmatory data analysis): Using sample data to


draw conclusions about a population

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 10 / 46


Biostatistics
Statistics vs. Biostatistics

Biostatistics is the study of statistics as applied to biological areas


Similarly,

Scientific branch Applied area


Sociometrics Social sciences
Psychometrics Psychological sciences
Econometrics Economics
Technometrics Physical, chemical, and engineering sciences
Anthropometrics The measurement of the human individuals
Bibliometrics Written publications, such as books or articles
Scientometrics Quantitative features and characteristics of science
Informetrics Information sciences
Cliometrics Historical sciences
Table: The fields of applied statistics

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 11 / 46


Biostatistics
Biostatistics: Research planning to making decisions

Biostatistics deals with the development and application of the most


appropriate methods for the:
Research planning including design of experiments, clinical trials,
survey
Formulation of statistical hypotheses and determination of appropriate
methodology including sampling and sample size calculation
Data analysis
Presentation,interpretation and reporting of the results
Making decisions on the basis of such analysis

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 12 / 46


Biostatistics
Why Biostatistics?

Parents, with genetic anomalies in their children, suspects that they


will have anomalies in their new children and want to decide whether
or not to have children.
A manufacturer developed an in vitro diagnostic procedure to replace
the microplate procedure and wants to know whether there is a
systematic measurement error between the procedures.
An oncology physician wants to choose the best therapy (e.g.
chemotherapy, radiotherapy, etc.) to a breast cancer patient.
A transplantation company is trying to determine the mean survival
time after bone marrow transplantation in leukemia patients.
A pharmaceutical company is trying to identify the candidate
metabolomics biomarkers of lung cancer.

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 13 / 46


Biostatistics
Research topics in Biostatistics

Clinical trials
Diagnostic tests and ROC analysis
Multivariate analysis
Survival analysis
Machine-learning
Neural-networks and deep learning
Bioinformatics
Multiple testing and multiple comparisons
Statistical modeling of high-dimensional data
Biomarker discovery
Personalized medicine
Statistical programming
...
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 14 / 46
Bioinformatics

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 15 / 46


Population and Sample
Population: A collection of people or objects that share common
observable characteristics

Sample: A random subset of population

Sampling: The process of selecting samples from the population

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 16 / 46


Parameter and Statistic

Parameter: The measures describing the variables of populations

Statistic: Corresponding estimate from a sample

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 17 / 46


Parameter and Statistic

Measure Population parameter Sample statistic


Number of observations N n
Mean µ x
Median η M
Proportion p p̂
Standard deviation σ s
Variance σ2 s2
Skewness ν g1
Kurtosis τ g2
Correlation coefficient ρ r
Regression coefficient β b
Table: Commonly used parameter and statistics

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 18 / 46


Why Sampling?
Reduced cost
Reduced time
Practical in most situation
Sometimes it is impossible to study the whole population (e.g. marine
biology)

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 19 / 46


Estimation and Bias

Estimation: Using the sample statistic in place of the population


parameter

Bias: A systematic deviance between the population parameter and the


sample statistic

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 20 / 46


Accuracy and Precision
Both accuracy and precision reflect how close a measurement is to an
actual value

Accuracy: Closeness of a measurement to its true value

Precision: Closeness of repeated measurements

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 21 / 46


Bias and Imprecision

Bias: Calculated from the distance between two measurements

Imprecision: Calculated from the variation of the data

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 22 / 46


Sources of Bias

Selection bias
Patients are selected according to the researchers’ own arbitrary criteria.

Evaluation bias
The treatment of the patient is known by the physician. The physician is
unable to evaluate the effectiveness or reliability of the treatment he / she
has tested.

Publication bias
The researchers/scientific editors prefer more frequently to publish the
studies in which popular findings are obtained.

Recall bias
It is caused by the accuracy or deficiencies of the individuals participants
of the research to remember past events or experiences.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 23 / 46
Observation
Definition

Observation: The value of something of interest which is measured or


counted during a study (or, a case of the data being collected)

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 24 / 46


Variable
Definition

Variable: Observed or measured characteristics, which takes different


values for different observations (or, characteristic of the observation
recorded in the data).

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 25 / 46


Random Variable
Discrete and Continuous Random Variables

A variable whose value is determined by chance (cannot be exactly


predicted in advance).
Random variables are denoted by uppercase letters (X , Y ). Observed
numerical values of random variables are denoted by lowercase letters
(x, y ).

Discrete random variables: Can take on a countable number of distinct


values such as X ={x ∈ Z>0 | 1,2,3,. . . ,100}
e.g. # of visits to a doctor in a year, leukocyte count

Continuous random variables: Can take on any value in some intervals


of real numbers Y ={y ∈ R}
e.g. diastolic blood pressure (mmHg) , body mass index (kg/m2 )

Discrete variables are counted, while continuous variables are measured.


G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 26 / 46
Types of Variables
Independent, Dependent and Controlled Variables

Dependent (outcome, response) variable (Y ): The variable being


tested and measured in a scientific experiment

Independent (predictor, explanatory) variable (X ): The variable that


is changed or controlled in a scientific experiment to test the effects on the
dependent variable

Controlled variable: The variable which remains constant

Y = f (X ) e.g. Y = β0 + β1 X

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 27 / 46


Types of Variables
Independent, Dependent and Controlled Variables

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 28 / 46


Data
Data, Data Analysis, Database

Data: The raw material of statistics. Numbers obtained from any record,
descriptive accounts, or symbolic representation of an attribute, event, or
process (singular form: datum)

Data analysis: The process of compiling and analysing data to make


inference and support decision making

Database: An organized collection of data, generally stored and accessed


electronically from a computer system

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 29 / 46


Data
Data, Information, Knowledge and Wisdom

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 30 / 46


Data
Data
Data represents a fact or statement of event without relation to other
things.

Information
Information embodies the understanding of a relationship of some sort,
possibly cause and effect.

Knowledge
Knowledge represents a pattern that connects and generally provides a
high level of predictability as to what is described or what will happen next.

Wisdom
Wisdom embodies more of an understanding of fundamental principles
embodied within the knowledge that are essentially the basis for the
knowledge being what it is. Wisdom is essentially systemic.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 31 / 46
Data

Data
It is raining.

Information
The temperature dropped 15 degrees and then it started raining.

Knowledge
If the humidity is very high and the temperature drops substantially the
atmospheres is often unlikely to be able to hold the moisture so it rains.

Wisdom
It rains because it rains. And this encompasses an understanding of all the
interactions that happen between raining, evaporation, air currents,
temperature gradients, changes, and raining.

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 32 / 46


Data
Source of Biomedical Data

Electronic health records


Clinical trials
Descriptive surveys
Medical research
Patient-generated health data
Laboratory results
Examination
Inpatient health monitoring
Imaging data
Genetics data
Experimental data
Text data
...
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 33 / 46
Data
Types of Data

The methods for describing and analyzing data depend upon the type of
data: qualitative and quantitative.

Qualitative data: Individuals are placed into categories, according to a


quality, that do not have numerical values e.g. gender (male/female)

Quantitative data: Numerical data that have a natural order and can be
continuous or discrete e.g. hemoglobin level (g/dL)

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 34 / 46


Data
Scale of Data

Nominal Qualitative
The numbers are simply indicators of a category.
e.g. Smoking status (0: No, 1: Yes)

Ordinal Qualitative
The numbers represent an ordering or ranking of the observation
e.g. Obesity (1: Underweight, 2: Normal, 3: Overweight, 4: Obese)

Interval Quantitative
Measured on a scale continuously in equal units (no true zero value)

Ratio Quantitative
Measured on a scale continuously in equal units (has true zero value)

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 35 / 46


Data
Scale of Data

Nominal Qualitative
e.g. Gender (1: Male, 2: Female), Blood type (1: 0, 2: A, 3: B, 4: AB),
Leukemia subtypes (1: ALL, 2: AML, 3: CML, 4: Other), etc.

Ordinal Qualitative
e.g. Education level (1: <High school, 2: High school, 3: Bachelors, 4:
Masters, 5: Doctorate), Grade of breast cancer (1: Grade-1, 2: Grade-2,
3: Grade-3), etc.

Interval Quantitative
e.g. Temperature °C, Level of happiness (1-10), etc.

Ratio Quantitative
e.g. Height (cm), AST (U/L), Gene expression, etc.
G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 36 / 46
Data
Data Matrix

nxp dimensional table, which includes the data of p variables belonging to


n observations

 
x11 x12 x13 . . . x1p

 x21 x22 x23 . . . x2p 

A=
 x31 x32 x33 . . . x3p 

 .. .. .. . . .. 
 . . . . . 
xn1 xn2 xn3 . . . xnp

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 37 / 46


Data
Data Analysis Software

The computer environment to describe, analyze and visualize the data.

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 38 / 46


Why R?

R is free
R is open source
R is a programming language
R includes advanced graphical
libraries
R is a flexible statistical analysis
tool
R has a large and active
community
R has unlimited capabilities

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 39 / 46


Why TURCOSA?

TURCOSA is user-friendly
Usage with minimal statistical knowledge
TURCOSA works on cloud
Data analysis on PC, tablet and smartphone
TURCOSA is a project-based tool
Multiple data, multiple users at same project
TURCOSA provides interactive reporting
Interactive tables and graphs
TURCOSA supports multiple languages
English and Turkish languages currently
Subscription payment model
Monthly and yearly payment

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 40 / 46


Biostatistics Sources
Books

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 41 / 46


Biostatistics Sources
Journals

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 42 / 46


Biostatistics Sources
Communities and Events

International Biometric Society


http://www.biometricsociety.org
10th Conference of the EMR IBS | https://www.emr2018.com
International Society for Clinical Biostatistics
http://www.iscb.info
ISCB ASC 2018 | https://iscbasc2018.com
Turkish Association of Biostatistics
http://biyoistatistikdernegi.org.tr
XX. National and III. International Biostatistics Congress |
http://www.biyoistatistikkongresi.org

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 43 / 46


References I

Alpar, R. [2016]. Uygulamalı İstatistik ve Geçerlik Güvenirlik, 4th ed. Detay


Yayincilik, Ankara

Alpar, R. [2017]. Uygulamalı Çok Değişkenli İstatistiksel Yöntemler, 5th ed. Detay
Yayincilik, Ankara

Chernick, M. R., and Friis, R. H. [2003]. Introductory Biostatistics for the Health
Sciences: Modern Applications Including Bootstrap, 1st ed. Wiley Interscience,
New Jersey

Crawley, M.J. [2004]. The R Book, 1st ed. Wiley, England

Elston, R. C., and Johnson, W. D. [2008]. Basic Biostatistics for Geneticists and
Epidemiologists: A Practical Approach, 1st ed. Wiley, UK

Fisher, R. A. [1950]. Statistical Methods for Research Workers, 11th ed. Hafner,
New York.
Fisher, L. D., van Belle, G., Heagerty, P. J., Lumley, T. [2004]. Biostatistics: A
Methodology for the Health Sciences, 2nd ed. Wiley Interscience, New Jersey

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 44 / 46


References II

Forthofer, R. N., Lee, E. S., Hernandez, M. [2007]. Biostatistics: A Guide to


Design, Analysis, and Discovery, 2nd ed. Elsevier, London

Kendall, M. G., and Stuart, A. [1963]. The Advanced Theory of Statistics, Vol. 1,
2nd ed. Charles Griffin, London.

Kruskal, W. [1968]. In International Encyclopedia of the Social Sciences, D. L. Sills


(ed). Macmillan, New York.

Logan, M. [2010]. Biostatistical Design and Analysis Using R: A Practical Guide,


1st ed. Wiley Blackwell, UK

Mainland, D. [1963]. Elementary Medical Statistics, 2nd ed. Saunders, Philadelphia.

Mood, A. M. [1950]. Introduction to the Theory of Statistics. McGraw-Hill, New


York.
Rosner, B. [2011]. Fundamentals of Biostatistics, 7th ed. Brooks/Cole, Cengage
Learning, Boston

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 45 / 46


References III

Savage, I. R. [1968]. Statistics: Uncertainty and Behavior. Houghton Mifflin,


Boston.
Sokal, R. R., and Rohlf, F. J. [1969]. Introduction to Biostatistics. W. H. Freeman
and Co., Ltd., New York.

von Mises, R. [1957]. Probability, Statistics and Truth, 2nd ed. Macmillan, New
York.
Wassertheil Smoller S. [2004]. Biostatistics and Epidemiology: A Primer for Health
and Biomedical Professionals, 3rd ed. Springer-Verlag, New York

Wayne W. D. [2005]. Biostatistics: A Foundation for Analysis in the Health


Sciences, Ninth Edition. Wiley, New York.

G. Zararsız & D. Goksuluk Introduction to Biostatistics March 01, 2021 46 / 46

You might also like