Basic Statistical Procedures and How to Compute (1)
Date 11th March 2011
Statistics is one of the major disturbances in learning, reading and
conducting medical and public health researches. Because of the subject
nature, it is mainly based on mathematics and some equations and
technical jargons make the learners to be afraid. So I want to discuss
these subject matters in easy way. I intentionally avoid complex
procedures and some traditional approach in basic biostatistics. Ill explain
these parts in later.
First from articles to articles, basically there are only two things exist
(which the learner wants to know).
Cause Variable
Variable
----------------------------------------------------------Effect
Predictor Variable
Response Variable
----------------------------------------------------------
If the termvariable make you head ache, leave it. This is only technical
term (Nothing more, nothing less).
For simple manner, we will substitute cause & effect OR predictor &
response in more practical term
Cause (Predictor)
(Response)
1. Sex (M/F)
(kg/sqm)
2. Sex (M/F)
3. Age (yr)
(kg/sqm)
4. Age (yr)
----------------------------------------------------------Effect
----------------------------------------------------------
BMI
---------------------------------------------------------- Disease (+/-)
----------------------------------------------------------
BMI
---------------------------------------------------------- Disease (+/-)
These only four simple models exist among most of research questions.
(Some procedures were intentionally left. Please feel free about this. Ill
discuss later.)
These equations actually mean that
1.
2.
3.
4.
BMI difference between male and female
Disease status between male and female
Relation between BMI and age
Disease status according to age
Some technical term we should know will be continuous variables
and categorical variables.
From the above examples, Age (yr) and BIM (kg/sqm) were
continuous variables because they are measure on some units and
some decimal places in their measurements represent some meaning (2.5
yr of Age or 32.5 kg/sqm in BMI). So, their measurements are continuous
in nature. (Sometimes called measuring variables)
Meanwhile, Sex (Male/Female) and Disease Status were
categorical variables because they are counted on exactly whole
number based on which category they include and decimal place in their
counting represent meaningless (3 females, 5 patients with disease
positive). (Sometimes called counting variables)
So, we refocus on the equations above, there are four main types in
research questions
Cause (Predictor)
(Response)
----------------------------------------------------------Effect
1. categorical
continuous
2. categorical
categorical
3. continuous
continuous
4. continuous
categorical
----------------------------------------------------------
----------------------------------------------------------
----------------------------------------------------------
----------------------------------------------------------
So let us begin to choose appropriate statistical test.
Categorical
Continuous
Sex (M/F)
----------------------------------------------------------
---------------------------------------------------------- BMI (kg/sqm)
We want to know is the difference between mean BMI of male and mean
BMI of female.
The statistical test will be Students t test (independent t test).
Sometimes the predictor categorical variables contain more than 2 groups
like
Treatment Status (Drug A, Drug B and Drug C) ----------------------
(kg/sqm)
BMI
But the outcome continuous variable the same
We want to know is the difference between mean BMI of Group A, B and C.
Little change in statistical procedure
The statistical test will be ANOVA (analysis of variance) rather than
Students t test (independent t test).
(For some reason, it analyze that the means of three groups are different
or not, instead of the name analysis of variance)
Categorical
Categorical
Sex (M/F)
(+/-)
----------------------------------------------------------
-----------------------------------------------------------Disease
Disease Status (DS A, DS B and DS C)
Sex (M/F)
-----------------------------------------
We want to know is the difference in distribution (number or counts of
disease (+/-) between male and female.
Our famous statistical test: Chi-square Test (2 test) will be needed to
analyze these situations.
Continuous
Continuous
Age (yr)
----------------------------------------------------------
---------------------------------------------------------- BMI (kg/sqm)
BMI (kg/sqm)
Level (mg%)
---------------------------------------------------------- Blood Sugar
We want to know is the relationship (association) between age and BMI or
BMI and Blood Sugar Level.
Only two basic statistical approaches needed.
1. Correlation Analysis (for strength of association)
2. Regression Analysis (for prediction of response variable upon
changes in predictor variables)
Continuous
Categorical
Age (yr)
----------------------------------------------------------
---------------------------------------------------------- Disease (+/-)
Last research question or procedure may be a little tricky.
The researcher want to know will be the chance of developing disease
according to age.
So the model will be from continuous predictor variable to categorical
outcome variables.
Some researchers try to solve this situation reversely with (t test or
ANOVA) like
Categorical
Continuous
----------------------------------------------------------
Disease (+/-)
---------------------------------------------------------- Age (yr)
But for my point of view, if the research question is very simple
(only want to know the chance of developing disease according to age),
avoid using t test or ANOVA. There is a statistical analysis exist for this
situation, namely, Logistic Regression.
(For easy way to memorize, if the predictor variable is continuous, it may
lead to regression model e.g. simple linear regression, multiple
regressions, logistic regression and its variations)
Cause (Predictor
variable)
Effect (Outcome or
response variable)
Statistical Test
Categorical
Continuous
Students t test, ANOVA
Categorical
Categorical
Chi Square Analysis
Continuous
Continuous
Correlation Analysis,
Regression Analysis
Continuous
Categorical
Logistic Regression