INFERENTIAL ANALYSES II
(RELATIONSHIPS)
Dr. Abdul Rahman Mahmoud Fata Nahhas
KOP – IIUM
Final Year Research Project
SEM 1
2023-24
CORRELATION ANALYSIS
Introduction
Correlation measures the strength and direction of a
relationship that exists between two variables
Partial correlation: Three or more variables are included,
& correlation between two variables is explored, while the
effect of others is removed
E.G. Correlation between blood pressure and amount of
salt intake after adjustment for the effect of a third variable;
such as amount of fluid intake
Introduction
Example (positive correlation)
Typically, in the summer as the temperature
increases people are thirstier, consuming
more water
Introduction
Water
Temperature Consumption
For seven (C) (Liters)
random summer days, 25 1
a person recorded the
temperature and his
29 1.3
water consumption, 35 1.7
during a three-hour 37 1.9
period spent outside 39 2
41 2.3
44 3.1
Introduction
3.5
3
Water Consumption (L)
2.5
1.5
0.5
0
20 25 30 35 40 45 50
Temperature (C)
Introduction
Correlation treats all variables equally
Correlation does not take into consideration
whether a variable has been classified as a
dependent or independent variable
Introduction
For instance, you might want to find out whether
basketball performance is correlated to a
person's height
Thus, you’ll plot a graph of performance against
height and calculate the correlation coefficient r
If - let’s say - r = 0.72, hence, we can conclude
that as height increases so does basketball
performance
Types of correlation
Two main types of Correlation Analysis
Pearson product-moment Spearman's Rank-Order
correlation (Parametric) Correlation (Non-Parametric)
REQUIRES DOES NOT REQUIRE
A normally distributed Pearson correlation
data assumptions
A linear relationship
between the two
variables in question
No heteroscedasticity
Pearson product-moment correlation
A parametric measure of the strength and direction of a
linear relationship that exists between two continuous
variables
Denoted by the symbol r
Attempts to draw a line of best fit through the data of two
variables
The Pearson correlation coefficient, r, indicates how far
away all these data points are to this line of best fit (i.e.,
how well the data points fit this new line of best fit)
Spearman Rank-order Correlation
A nonparametric measure of the strength and direction of
relationship that exists between two variables measured on at
least an ordinal scale
Denoted by the symbol rs (or the Greek letter ρ, pronounced
rho)
Used for either ordinal variables or for continuous data that
has failed the assumptions necessary for conducting the
Pearson's product-moment correlation
Detecting a linear relationship
How can you detect a linear relationship
between tested variables?
Simply by plotting the variables on a graph
(a scatterplot, for example) and visually
inspecting the graph's shape and observe the data
points and their location compared to the line of
best fit
Detecting a linear relationship
3.5
2.5
2 Linear relationship
1.5
0.5
0
5 10 15 20 25 30 35 40 45 50 55
Detecting a linear relationship
3.5
2.5
2 Linear relationship
1.5
0.5
0
0 10 20 30 40 50 60
Detecting a linear relationship
1.2
0.8
0.6 Non-linear relationship
0.4
0.2
0
10 12 14 16 18 20 22
Detecting a linear relationship
1.8
1.6
1.4
1.2
1
Curvilinear relationship
0.8
0.6
0.4
0.2
0
5 10 15 20 25 30
Correlation Coefficient
With the help of Correlation Coefficient, we can
determine:
1. The DIRECTION of the relation →
Positive or Negative
2. The STRENGTH of the relation among the
variables
Direction of Correlation
3.5
Positive
3
Correlation
Water Consumption (L)
2.5
1.5
0.5
0
20 25 30 35 40 45 50
Temperature (C)
Direction of Correlation
6
Negative
5 Correlation
4
Stress Score
0
15 20 25 30 35 40 45 50
Work Performance Score
Strength of Correlation
Strength of Coefficient
Correlation
Positive Negative
Small 0.1 to 0.29 - 0.1 to - 0.29
Medium 0.3 to 0.49 - 0.3 to - 0.49
Large 0.5 to 1 - 0.5 to - 1
Strength of Correlation
If r (or rs) equals zero, then there is NO
RELATIONSHIP between the two variables
r = 1 → perfect positive linear relationship
r = -1 → perfect negative linear relationship
Strength of Correlation
Achieving a value of +1 or -1 means that all your
data points are included on the line of best fit
4.5 4
4
There are no data3.5points that show
3.5
any variation away 3 from this line
3
2.5
2.5
2
2
r = -1 r = +1
1.5
1.5
1
1
0.5 0.5
0 0
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 55
REGRESSION ANALYSIS
Definition
A predictive statistical method that investigate
the strength of the relationship between TWO
SETS of variables
It studies the dependence of one or more
variables (dependent variables) on one or more
variables (independent or predictor variables)
Regression Main Purposes
Regression PRIMARILY used to:
1. Estimate (describe) the relationship that exists between
the dependent variable(s) and the explanatory variable(s)
2. Determine the strength of impact of each of the predictor
variables on the dependent variable(s), controlling the
effects of all other predictor variables
3. Predict the value of dependent variable(s) for a given value
of the predictor variable(s)
Regression Equation
Can be obtained from all types of regression analysis
Once known, regression equation is used to predict
values of dependent variables, given the values of
independent (predictor) variables
E.g., if we knew a person's weight, we can then
predict their blood pressure using regression
equation
Regression Equation
E.g., using the simple linear regression model, an equation
obtained can be as the following:
Y = β0 + β1 *X + e
Typically Y is referred to the dependent variable, &
X as the independent variable
β0 is the intercept of the estimated line i.e. the value of Y
when X = 0
β1 is the gradient of the estimated line [slope of the line] i.e.
the amount by which Y change with one unit change of X
e is the error term or disturbance in the relationship,
represents factors other than X that affect Y
Types of Regression
Regression analysis is generally classified into two types
Simple Multiple
Regression involves only Regression involves more
two variables, one of than two variables,
which is dependent MAINLY, one of which is
variable and the other is dependent variable and the
explanatory (independent) others are explanatory
variable (independent) variables
The associated model will The associated model will
be a simple regression be a multiple regression
model model
Types of Regression
Type of dependent variables
Continuous Categorical
Number of Predictor Variables
1 Simple Linear Simple Logistic
>1 Multiple Linear Multiple Logistic
Linear Regression
Linear Regression establishes a relationship
between dependent (Continuous) variable
(Y) and one or more independent (predictor)
variables (X) using a best fit straight
line (also known as regression line)
Linear Regression
E.g., Predicting patients measured blood glucose level (in
mg/dl) based on dose of insulin infusion (in IU) … SIMPLE
LINEAR REGRESSION
Presume a sample of 20 DM patients for whom insulin infusion was
administered
We can plot the values on a graph, with insulin dose on the X axis
and blood glucose on the Y axis
If there were a perfect linear relationship between insulin dose and
blood glucose, then all 20 points on the graph would fit on a straight
line (But, this is never the case [unless your data are rigged])
Linear Regression
E.g., Predicting patients measured blood glucose level (in
mg/dl) based on dose of insulin infusion (in IU) … SIMPLE
LINEAR REGRESSION
If there is a (non-perfect) linear relationship between insulin dose
and blood glucose (presumably a negative), then we would get a
cluster of points on the graph which slopes downward
In other words, as insulin dose is increased; blood glucose level
declines…
Linear Regression
6
Y = β0 + β1 * X + e
5
4
BG = - 7.15 + .095 * Insulin dose
Glucose level
0
10 20 30 40 50 60 70 80
Inslin dose
Linear Regression
MULTIPLE LINEAR REGRESSION is the same idea as
simple linear regression, except that we have several
independent variables predicting the dependent variable
To continue with the previous example, assume that we now
want to predict a patient’s BG from insulin dose and gender
as well. In other word, we need to see if gender has also an
impact on the measured BG
In this case independent variables (Predictors) are Insulin
dose & Gender; while dependent variable is BG
Linear Regression
Multiple regression tells us the predictive
value of the overall model; all predictor
variables…
In our example, then, the regression would
tell us how will Insulin dose and Gender
predict a patient’s BG
Linear Regression
DETERMINES THE STRENGTH OF IMPACT OF EACH OF THE PREDICTOR
VARIABLE ON THE DEPENDENT VARIABLE(S), CONTROLLING THE EFFECTS
OF ALL OTHER EXPLANATORY VARIABLES
Multiple regression ALSO tells us how well each
predictor variable predicts the dependent variable,
controlling for each of the other predictor variables…
In our example, then, the regression would tell us how
will Insulin dose predicts a patient’s BG, while
controlling for Gender, as well as how will Gender
predict a patient’s BG, while controlling for Insulin
dose
Linear Regression
Assumptions
1. Number of cases: When doing regression, the cases-to-
Independent Variables (IVs) ratio should ideally be 20:1;
that is 20 cases for every IV in the model. The lowest your
ratio should be is 5:1 (i.e., 5 cases for every IV
in the model)
2. Normality: the scores for each variable should be
normally distributed
3. Linearity: There must be linear relationship between
independent and dependent variables
Linear Regression
Assumptions
4. Absence of Multicollinearity: Multicollinearity exists when the
independent variables are highly correlated (r=.9 and above)
5. Absence of Singularity: Singularity occurs when one
independent variable is actually a combination of other
independent variables (e.g. when both subscale scores and the
total score of a scale are included)
6. Outliers: Linear regression is very sensitive to outliers (very
high or very low value on a particular item). Outliers can terribly
affect the regression line and eventually the forecasted values
Logistic Regression
Used to find the probability of event of
Success and event of Failure
Used when the dependent variable is binary
(0/ 1, True/ False, Yes/ No) in nature
Logistic Regression
E.g., Predicting if a group of people having depression
or no (depression Yes/No) based, for instance, on
place of residence (Urban/Rural)… SIMPLE
LOGISTIC REGRESSION
Presume a sample of 50 persons whom depression
was assessed by a psychologist
Person’s place of residence was reported
Logistic Regression
E.g., Predicting if a group of people having depression or no
(depression Yes/No) based for instance on place of residence
(Urban/Rural)… SIMPLE LOGISTIC REGRESSION
On a graph, we can plot the result of depression
assessment (Y/N) on the Y axis and the results of
reported place of residence (U/R) on the X axis
From the graph, we can infer if depression is more
likely to be present among urban or rural persons
Logistic Regression
Assumptions
Number of cases: When doing regression, the cases-to-
Independent Variables (IVs) ratio should ideally be 20:1; that is
20 cases for every IV in the model. The lowest your ratio
should be is 5:1 (i.e., 5 cases for every IV
in the model)
Normality: Logistic regression doesn’t require that data to be
normally distributed (non-parametric test)
Linearity: Logistic regression doesn’t require linear
relationship between dependent and independent variables
Logistic Regression
Assumptions
Absence of Multicollinearity: Multicollinearity exists when the
independent variables are highly correlated (r=.9 and above)
Absence of Singularity: Singularity occurs when one
independent variable is actually a combination of other
independent variables (e.g. when both subscale scores and the
total score of a scale are included)
Outliers: Logistic regression is sensitive to outliers (very high or
very low value on a particular item). Outliers can terribly affect
the regression line and eventually the forecasted values
THANK
YOU!