0% found this document useful (0 votes)
38 views9 pages

Regression and Correlation Notes

The document discusses levels of measurement in statistics, including nominal, ordinal, interval, and ratio scales, and outlines types of statistical analysis such as descriptive and inferential statistics. It explains the concepts of dependent and independent variables, correlation, and regression analysis, detailing how to calculate correlation coefficients and fit regression models. A trial question is provided to apply these concepts, demonstrating calculations for correlation and regression based on a dataset of districts and hospitals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views9 pages

Regression and Correlation Notes

The document discusses levels of measurement in statistics, including nominal, ordinal, interval, and ratio scales, and outlines types of statistical analysis such as descriptive and inferential statistics. It explains the concepts of dependent and independent variables, correlation, and regression analysis, detailing how to calculate correlation coefficients and fit regression models. A trial question is provided to apply these concepts, demonstrating calculations for correlation and regression based on a dataset of districts and hospitals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

GEOG 303: Notes on Correlation and Regression

Levels of measurement
In statistics and quantitative research methodology,
various attempts have been made to classify variables
(or types of data) and thereby develop a taxonomy of
levels of measurement or scales of measure. The
various levels of measurement are as shown below:
Nominal Scale/ data
This scale categorizes and differentiates items based
only on their names and other qualitative
classifications they belong to. The items cannot be
ranked. E.g. Names of towns in Ghana, Days of the
week, religion. These are called categorical variables.
Ordinal Scale/data
The data on this scale can be ranked but intervals are
not the same. Eg. Positions in a beauty contest; level
of education.
Interval Scale/ Data
The items can be ranked and intervals are the same,
but the position of zero is arbitrary. Examples include
temperature with the Celsius scale, which has an
arbitrarily-defined zero point (the freezing point of a
particular substance under particular conditions), date
when measured from an arbitrary epoch (such as AD)
and direction measured in degrees from true or
magnetic north.
Ratio Scale or Ratio Data
We can rank items on this scale and their intervals are
also the same. Again and there is a true zero. Eg.
Income, length, number of years of schooling, ages etc.

TYPES OF STATISTICAL ANALYSIS


DESCRIPTIVE STATISTICS: This involves the use
of rates, percentages and graphs to represent data
collected on a sample. E.g. the use of frequency tables;
computation of mean, mode and median.

INFERENTIAL STATISTICS: This involves the


application of statistical techniques (e.g. chi-square
tests etc) and sample data to draw conclusions about
the population parameter.
DEPENDENT AND INDEPENDENT VARIABLES:
Dependent Variables are those that are only measured
or registered whereas the independent variables are
those that are manipulated to influence the outcome of
the dependent variables. For instance, if we want to
measure effects of years of schooling on salaries, then
salaries will be the dependent variable while years of
schooling will be independent variable.

CORRELATION AND REGRESSIONS


Correlation is a measure of the strength and direction
of the relationship between two quantifiable variables.
Positive correlation denotes the positive directional
relationship between 2 variables. This means an
increase in one variable is associated with an increase
in another variable. The coefficient of correlation is
positive. Negative correlation denotes the
negative/inverse directional relationship between 2
variables. An increase in one variable is associated
with a decline in the other variable. The coefficient of
correlation is negative.
Explanation of correlation coefficient
The coefficient of correlation (r) lies between -1 and 1.
A positive sign denotes a positive correlation whiles a
negative sign denotes a negation relationship between
the variables. The strength is explained in terms of the
size of the variables using the guideline below:
0.0-0.2: Negligible/zero correlation –meaning there is
no relationship between the two variables
0.21-0.4: Weak correlation-meaning a relationship
exists but it is very weak
0.41-0.7: Moderate correlation
0.71-0.99: Very strong correlation/ high sense of
correlation

Coefficient of determination
This explains the amount of variability in the
independent variable that is explained by variability in
the dependent variable.
Coefficient of determination = r2 X 100%

Regression Analysis
Regression is a technique used to analyse the
relationship between 2 or more variables and how one
variable affects the other. It is used to establish an
equation linking the two variables. There are various
types of regression.
1. Simple Linear Regression: This examines the
relationship between two variables (one dependent
and one independent variable) measured usually
on the ration scale. E.g. Age and weight.
2. Multiple Linear Regression: This measures the
relationship between one dependent variable and
several independent variables. For instance, one
can examine measure the relationship between
output of maize (dependent variable) and several
independent variables, including soil quality,
amount of fertilizer applied, rainfall amount etc.
3. Logistic Regresion: This is used when we are
interested in analyzing the relationship between
one dependent variable and several categorical
independent variables. Eg. We can analyse the
relationship between modern contraceptive use
(dependent or outcome variable) and variables
such as location, marital status, religion.
Trial Question
A. A social scientist is interested in establishing
the degree of the relationship between number of
districts and number of hospitals in eight
randomly selected administrative regions in the
Republic of Nsutapong. The table below
summarizes the data he obtained from the field.
Regions Number of Districts Number of Hospitals
A 2 3
B 4 3
C 5 4
D 5 5
E 6 7
F 7 8
G 9 9
H 10 11

(a) Calculate the Pearson’s Product-Moment Correlation Coefficient (r) between number of
districts and number of hospitals and interpret your answer.
(b) Compute the coefficient of determination and interpret your answer.
(c) Fit a linear regression model for estimating the number of hospitals (y) from a given
number of districts (x).
(d) Using your model or otherwise find the number of districts in a region with 15 hospitals.

Solution
(a) Let x represent the number of districts while y represents the number of hospitals. To
calculate the correlation we construct the table below:
x y xy x2 y2
2 3 6 4 9
4 3 12 16 9
5 4 20 25 16
5 5 25 25 25
6 7 42 36 49
7 8 56 49 64
9 9 81 81 81
10 11 110 100 121
∑x =48 ∑y=50 ∑xy=352 ∑x2= 336 ∑y2=374

Given that n=8 ∑x=48 ∑y=50 ∑xy=352 ∑x2= 336 ∑y2=374

8(352)−(48 x 50)
r=
√ [8 ( 336 )−( 48 ) ¿¿ 2][8 ( 374 )−( 50 ) ¿¿ 2]¿ ¿

2816−2400
r = √ (2688−2304 )(2992−2500)

416
r = √ (384)(492)

416
r= 434.7

r= 0.96

Since r is 0.96, there is a strong positive correlation between the number of districts and the
number of hospitals. This means that the higher the number of districts in a region, the higher the
number of hospitals in the region.
(b) Coefficient of determination for the data = r2 x 100%

Where r = 0.96

r2= (0.96)2 x 100%

r2= 0.9216 x 100%

r2= 92.16% or 92.2%

This means that 92.2% of the variations/variability in the number of hospitals is accounted for or
explained by the variability in the number of districts, leaving the remaining 7.8% of the
variations in the number of hospitals to be accounted for by factors other than the number of
districts.

(c ). Regression Analysis

Equation of a simple linear regression is given as y = a + bx. where

From the table in (a), we know that n=8 ∑xy=352 ∑x2= 336

X mean =48/8 =6 Y mean=50/8=6.25

By substitution,

352−8(6 × 6.25) 352−8(37.5)


b= =
336−8(6)2 336−8 (36)

352−300
b=
336−288

52
b= = 1.08
48

Given b as 1.08,

a= [ 6.25−(1.08 ×6) ]
a= 6.25−¿6.48

a= - 0.23

Since equation of a simple linear regression line is given as y = a + bx.

Substituting derived values into the equation;

y = - 0.23 +1.08x

(d) Using the model find the number of districts in a region with 15 hospitals.

We substitute Y= 15 into the equation

15= -0.23+1.08X

15.23 =1.08X

15.23/1.08 = X

X=14.1. There were 14 districts.

You might also like