0% found this document useful (0 votes)
78 views41 pages

EDU 801 Lecture Note Summarized For 2025

The document outlines statistical methods for educational research, focusing on t-tests, ANOVA, and regression analysis. It explains the paired samples t-test for comparing dependent samples, including its assumptions, execution in SPSS, and interpretation of results. Additionally, it covers independent t-tests for unrelated groups, detailing hypotheses, assumptions, and how to conduct the analysis using SPSS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views41 pages

EDU 801 Lecture Note Summarized For 2025

The document outlines statistical methods for educational research, focusing on t-tests, ANOVA, and regression analysis. It explains the paired samples t-test for comparing dependent samples, including its assumptions, execution in SPSS, and interpretation of results. Additionally, it covers independent t-tests for unrelated groups, detailing hypotheses, assumptions, and how to conduct the analysis using SPSS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

EDU 801 – STATISTICAL METHODS IN EDUCATIONAL RESEARCH

TOPICS;
1. Distributions of comparing two treatment t-tests
2. Comparing several treatments – ANOVA and ANCOVA for different designs
3. Mean separation tests
4. Simple and multiple regression

T – test for correlated or dependent samples (paired samples t -test).


A paired samples t-test examines if two variables are likely to have equal population means.
Hence, a paired sample t – test is commonly used to compare the means of two dependent or
correlated samples in two scenarios:
a) Pretest and post test scores of students after exposing them to a treatment eg (an
instructional method / strategy).
b) Achievement scores of same students in two measurements eg scores in mathematics
and physics.
In both cases, the data are dependent and correlated. A paired samples t – test always uses the
following null hypothesis:
(Ho : µd = O (the mean of the paired difference equals zero in the population)
(Ha : µd ≠ O (the mean of the paired difference does not equal zero in the population)
Assumptions of Paired samples t-test
Technically, a paired samples t-test is equivalent to a one sample t-test on difference scores. It
therefore requires the same 2 assumptions. These are
1. Independent observations
2. Normality: the difference scores must be normally distributed in the population.
Carrying out a paired samples t – test in SPSS
Analyze > Compare means > paired samples t test > click ok.
This will present three output tables, vix; paired sample statistics, paired sample correlation and
paired samples test. However, we are interested in the first and last tables. Paired samples
statistics gives the mean, standard deviation, sample size and standard error of mean for each
group, while the paired samples test give the mean, standard deviation, standard error mean,
confidence interval of the difference for the paired difference as well as the t-value, degree of
freedom and significance (for 2-tailed).
Decision will be based on comparing sig (2 – tailed) with set alpha level (α = 0.05). Reject the
null hypothesis if significance is less than set alpha level.

1
Example: the pretest and posttest mathematics achievement scores of a random sample of 20
students who were exposed to emotional intelligence therapy are shown in table below. Does
the resulting scores provide sufficient evidence to show that the therapy improved
achievement of students in mathematics at α = .05

Studen 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2
t 0 1 2 3 4 5 6 7 8 9 0
Pretest 1 2 1 2 1 2 1 2 2 1 1 1 1 1 1 2 1 2 1 1
8 1 6 2 9 4 7 1 3 8 4 6 6 9 8 0 2 2 5 7
Posttes 2 2 1 2 1 2 2 2 1 2 1 1 1 2 1 2 1 2 1 1
t 2 5 7 4 6 9 0 3 9 0 5 5 8 6 8 4 8 5 9 6

Solution
Ho : µd = O (The mean of the paired difference of pretest and posttest is zero).
Ha: µd ≠ O (The mean of the paired difference of pretest and posttest is not equal to zero).
The output result from SPSS is as shown in tables below

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

pre for paired 18.40 20 3.152 .705


Pair 1
post for paired 20.45 20 4.058 .907

Paired Samples Correlations

N Correlation Sig.

pre for paired & post for


Pair 1 20 .717 .000
paired

Paired Samples Test

Paired Differences T df Sig. (2-tailed)

Mean Std. Std. Error 95% Confidence


Deviation Mean Interval of the
Difference

Lower Upper

2
pre for paired
Pair 1 - post for -2.050 2.837 .634 -3.378 -.722 -3.231 19 .004
paired

From paired samples statistics table, sample size is 20, the mean of pretest is 18.40, with
standard deviation 3.152 and standard error mean is .705. Also, the mean of posttest is 20.45,
standard deviation of 4.058 and standard error of mean is .907
From paired samples test table, the paired differences has mean of -2.050, standard deviation
of 2.837, standard error of mean as .634. The t-value is -3.231, df =19 and sig (2-tailed) = .004

Decision: The sig or P value < alpha level (.004 < .05), hence Ho is rejected; implying the mean
of the paired difference is not equal to zero. That means that the difference in the achievement
of the students pre and post emotional intelligence therapy is significant.
Conclusion: the emotional intelligence therapy improved achievement of the students in
mathematics, since the mean of posttest (20.45) is greater than the mean of pretest (18.4).
Exercise:
The continuous assessment scores of a random sample of 10 students in mathematics and
physics are shown in table below. Determine whether the performance of the students in the 2
subjects is the same at .05 significant level.

Studen 1 2 3 4 5 6 7 8 9 10
t
Maths 75 65 68 72 56 57 63 72 59 60
Physics 78 66 69 73 63 62 60 67 76 67

Independent t-test for two samples

Introduction

The independent t-test, also called the two-sample t-test, independent-samples t-test or
student's t-test, is an inferential statistical test that determines whether there is a statistically
significant difference between the means in two unrelated groups. Unrelated groups, also
called unpaired groups or independent groups, are groups in which the cases (e.g., participants)
in each group are different. Often we are investigating differences in individuals, which means

3
that when comparing two groups, an individual in one group cannot also be a member of the
other group and vice versa. An example would be gender - an individual would have to be
classified as either male or female; science and arts; experienced and inexperienced, control
and experimental groups – not both.

Null and alternative hypotheses for the independent t-test

The null hypothesis for the independent t-test is that the population means from the two
unrelated groups are equal: ie there is no significant difference between the means of the two
groups

H0: u1 = u2 (There is no significant difference in the means of the two groups)

HA: u1 ≠ u2 (There is significant difference in the means of the two groups)

In most cases, we are looking to see if we can show that we can reject the null hypothesis and
accept the alternative hypothesis, which is that the population means are not equal: ie there is
significant difference between the means of the two groups.

HA: u1 ≠ u2

To do this, we need to set a significance level (also called alpha) that allows us to either reject
or accept the alternative hypothesis. Most commonly, this value is set at 0.05.

Independent sample t test can be calculated manually using formula, or using SPSS and other
statistical packages. However, our focus is in the use of SPSS.

We now illustrate the steps involved in carrying out a t-test of the significance of difference
between means.
Step1: First we formulate an appropriate null hypothesis under the t-
test.
Ho : There is no statistically significant difference between the mean
physics achievement scores of male and female SSone
students.
Ha : There is a statistically significant difference between the mean
physics achievement scores of male and female SS one
students.

4
Step 2: Choice of an alpha level or level of significance. Let us
Conduct this test at the 0.05 level of significance.
Step3: Run the analysis using spss

Calculating Independent t test using SPSS;

What you need to run an independent t-test is:

 One independent, categorical variable that has two levels/groups.eg male and female;
science and arts; experienced and inexperienced, control and experimental groups
 One continuous dependent variable. eg Mathematics achievement, test anxiety etc.

Assumptions

When you choose to analyse your data using an independent t-test, part of the process involves
checking to make sure that the data you want to analyse can actually be analysed using an
independent t-test. You need to do this because it is only appropriate to use an independent t-
test if your data "passes" six assumptions that are required for an independent t-test to give
you a valid result. In practice, checking for these six assumptions just adds a little bit more time
to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing
your analysis, as well as think a little bit more about your data, but it is not a difficult task.

Before we introduce you to these six assumptions, do not be surprised if, when analysing your
own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met).
This is not uncommon when working with real-world data rather than textbook examples,
which often only show you how to carry out an independent t-test when everything goes well!
However, don't worry. Even when your data fails certain assumptions, there is often a solution
to overcome this. First, let's take a look at these six assumptions:

 Assumption #1: Your dependent variable should be measured on a continuous scale


(i.e., it is measured at the interval or ratio level). Examples of variables that meet this
criterion include revision time (measured in hours), intelligence (measured using IQ
score), exam performance (measured from 0 to 100), weight (measured in kg), and so
forth.
 Assumption #2: Your independent variable should consist of two categorical,
independent groups. Example independent variables that meet this criterion include
gender (2 groups: male or female), employment status (2 groups: employed or
unemployed), smoker (2 groups: yes or no), and so forth.
 Assumption #3: You should have independence of observations, which means that
there is no relationship between the observations in each group or between the groups
themselves. For example, there must be different participants in each group with no

5
participant being in more than one group. This is more of a study design issue than
something you can test for, but it is an important assumption of the independent t-test.
If your study fails this assumption, you will need to use another statistical test instead of
the independent t-test (e.g., a paired-samples t-test).
 Assumption #4: There should be no significant outliers. Outliers are simply single data
points within your data that do not follow the usual pattern (e.g., in a study of 100
students' IQ scores, where the mean score was 108 with only a small variation between
students, one student had a score of 156, which is very unusual, and may even put her
in the top 1% of IQ scores globally). The problem with outliers is that they can have a
negative effect on the independent t-test, reducing the validity of your results.
Fortunately, when using SPSS Statistics to run an independent t-test on your data, you
can easily detect possible outliers.
 Assumption #5: Your dependent variable should be approximately normally
distributed for each group of the independent variable. We talk about the independent
t-test only requiring approximately normal data because it is quite "robust" to violations
of normality, meaning that this assumption can be a little violated and still provide valid
results. You can test for normality using the Shapiro-Wilk test of normality, which is
easily tested for using SPSS Statistics.
 Assumption #6: There needs to be homogeneity of variances. You can test this
assumption in SPSS Statistics using Levene’s test for homogeneity of variances. You can
check assumptions #4, #5 and #6 using SPSS Statistics. Before doing this, you should
make sure that your data meets assumptions #1, #2 and #3, although you don't need
SPSS Statistics to do this. When moving on to assumptions #4, #5 and #6, we suggest
testing them in this order because it represents an order where, if a violation to the
assumption is not correctable, you will no longer be able to use an independent t-test
(although you may be able to run another statistical test on your data instead). Just
remember that if you do not run the statistical tests on these assumptions correctly, the
results you get when running an independent t-test might not be valid.

SPSS Statistics

Test Procedure in SPSS Statistics

The steps below show you how to analyse your data using an independent t-test in SPSS
Statistics when the six assumptions have not been violated.

Click Analyze > Compare Means > Independent-Samples T Test... on the top menu,

1. Click the button.


2. If you need to change the confidence level limits or change how to exclude cases, click
the button.
3. Click the button. You will be returned to the Independent-Samples T Test
dialogue box.

6
4. Click the button.

Output of the independent t-test in SPSS Statistics

If your data passed assumption #4 (i.e., there were no significant outliers), assumption #5 (i.e.,
your dependent variable was approximately normally distributed for each group of the
independent variable) and assumption #6 (i.e., there was homogeneity of variances), which we
explained earlier in the Assumptions section, you will only need to interpret these two main
tables. However, since you should have tested your data for these assumptions, you will also
need to interpret the SPSS Statistics output that was produced when you tested for them (i.e.,
you will have to interpret: (a) the boxplots you used to check if there were any significant
outliers; (b) the output SPSS Statistics produces for your Shapiro-Wilk test of normality to
determine normality; and (c) the output SPSS Statistics produces for Levene's test for
homogeneity of variances).

SPSS Statistics generates two main tables of output for the independent t-test. These are the
group statistics and independent samples test. The group statistics contains the groups, sample
size, mean, standard deviation and standard error mean whereas independent samples test
contains the group variances (equal variances assumed and equal variances not assumed),
Levene’s Test of equality of variances, and t- test equality of means, df and sig for two tail test.
Decision is based on comparing Sig with set alpha level. Reject Ho if sig < .05.

Example

The Mathematics scores of students in experimental and control groups are as shown in table
below. A researcher wants to determine if the performance of students in experimental and
control groups differ significantly.

Experimental Control
33 31
37 30
44 30
26 35
39 26
35 23
16 25
26 32
29 23

7
29 41
30 35
25 36
26 26
27 28
29 35
44 33
41 24
32 27
28 33
41 23
33 31
41 28
37 33
24 26
19 26
23 27
34 33
38 40
36 31
27 24
30 15
29 27
25 31
25 37
37 18
36 24
31 26
28 30
24 33
40 21

(a)State the appropriate hypothesis


(b)Test for significant difference in the mathematics achievement of experimental and control
groups at .05 level of significance.
(c)Comment on your results

Solution
Ho : There is no statistically significant difference between the mean

8
Mathematics achievement scores of experimental and control groups
Ha : There is a statistically significant difference between the mean
Mathematics achievement scores of experimental and control groups
Setup in SPSS Statistics

In SPSS Statistics, we separated the groups for analysis by creating a grouping variable called
Treatment (i.e., the independent variable), and gave the "experimental group" a value of "1"
and the "control group" a value of "2" (i.e., the two groups of the independent variable).
Mathematics achievement were entered under the variable name achievement (i.e., the
dependent variable).

Output result for independent samples t- test


Group Statistics

CONTROL AND EXP N Mean Std. Deviation Std. Error Mean

1.00 40 31.3500 6.79574 1.07450


ACHVT FOR TREATMENT
2.00 40 28.9250 5.62224 .88895

From the group statistics table, the experimental group has sample size = 40, mean = 31.35,
standard deviation = 6.796 and standard error of mean = 1.0745 whereas the control group has
sample size = 40, mean = 28.925, standard deviation = 5.622 and standard error of mean = .889.
This indicates that the experimental group has higher mathematics achievement. The result of the
independent samples test will show the significance or otherwise of the difference.

Independent Samples Test

Levene's Test for t-test for Equality of Means


Equality of
Variances

F Sig. T df Sig. (2- Mean Std. Error 95% Confidence Interval of


tailed) Difference Difference the Difference

Lower Upper

9
Equal
variances 2.126 .149 1.739 78 .086 2.42500 1.39456 -.35135 5.20135
assumed
ACHVT FOR
Equal
TREATMENT
variances
1.739 75.356 .086 2.42500 1.39456 -.35289 5.20289
not
assumed

From the table of independent samples test, the t – value (1.739) has significance value of .086.
You can see that the group means are not statistically significantly different because the value
in the "Sig. (2-tailed)" row is greater than 0.05. (.086 > .05) Looking at the Group Statistics
table, although the experimental group has higher mathematics achievement mean score than
the control group, but the difference in the scores is not statistically significant. Hence the study
found out that the treatment did not significantly improve the achievement of the students in
mathematics.

Exercise
In a study to investigate the influence of gender on students’ achievement in physics, the
following scores were obtained from a physics achievement test administered on a sample of
26 SS one students, made up of 14 males and 12 females:

Male students Female students


9 6
17 10
16 12
15 18
5 13
15 17
10 11
18 9
18 19
20 5
26 15
10 10
8
19

The researcher is interested in testing for the significance of gender as a factor in the
achievement of SS one students in physics.
a. State and test the appropriate hypothesis at .05 level of significance.

10
b. Comment on your result.

Analysis of Variance (ANOVA)


Analysis of variance is used to determine significant differences among three or more
distribution means. ANOVA is appropriate when there is random assignment of subjects to the
groups or when the groups are independent. ANOVA involves two types of variables –
independent or classificatory variable(s) and dependent variable(s). The independent variable
provides the basis for the classification of data in ANOVA. For instance, if a study involves only
one independent variable, there will be one way of classifying the associated data. Hence, the
appropriate ANOVA for handling such data will be one-way ANOVA. If it involves two or three
independent variables, it becomes two or three way ANOVA respectively. The test statistic for
ANOVA is known as the F- ratio or statistic.
The appropriate structure of the null hypothesis under the one way ANOVA is
Ho: “there is no significant difference among the K population means”.
µ1 = µ2 = µ3 =… = µk
Whereas the alternative is
Ha: “there is a significant difference among the k population means
µ1 ≠ µ2 ≠ µ3 ≠… ≠ µk
Although ANOVA can be computed manually or with formula, our discussion will focus on use
of SPSS.
Assumptions

Before we introduce you to these six assumptions, do not be surprised if, when analysing your
own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met).
This is not uncommon when working with real-world data rather than textbook examples,
which often only show you how to carry out a one-way ANOVA when everything goes well!
However, don’t worry. Even when your data fails certain assumptions, there is often a solution
to overcome this. First, let’s take a look at these six assumptions:

o Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e.,
they are continuous). Examples of variables that meet this criterion include revision time
(measured in hours), intelligence (measured using IQ score), exam performance (measured
from 0 to 100), weight (measured in kg), and so forth.

11
o Assumption #2: Your independent variable should consist of three or more
categorical, independent groups.
o Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves. There
should be random assignment of subjects to groups.
o Assumption #4: There should be no significant outliers. Outliers are simply single data points
within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores,
where the mean score was 108 with only a small variation between students, one student had a
score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally).
The problem with outliers is that they can have a negative effect on the one-way ANOVA,
reducing the validity of your results. Fortunately, when using SPSS Statistics to run a one-way
ANOVA on your data, you can easily detect possible outliers.
o Assumption #5: Your dependent variable should be approximately normally distributed for
each category of the independent variable. We talk about the one-way ANOVA only
requiring approximately normal data because it is quite "robust" to violations of normality,
meaning that assumption can be a little violated and still provide valid results. You can test for
normality using the Shapiro-Wilk test of normality, which is easily tested for using SPSS
Statistics.
o Assumption #6: There needs to be homogeneity of variances. You can test this assumption in
SPSS Statistics using Levene's test for homogeneity of variances.

You can check assumptions #4, #5 and #6 using SPSS Statistics. Before doing this, you should
make sure that your data meets assumptions #1, #2 and #3, although you don't need SPSS
Statistics to do this. Remember that if you do not run the statistical tests on these assumptions
correctly, the results you get when running a one-way ANOVA might not be valid.

Running the Procedure for one way ANOVA using SPSS


1. Click Analyze > Compare Means > One-Way ANOVA.
2. Add the appropriate variable to the Dependent List box, and the Factor box.
3. Click Options. Check the box for Means plot, then click Continue.
4. Click OK when finished.

SPSS Statistics Output of the one-way ANOVA

SPSS Statistics generates quite a few tables in its one-way ANOVA analysis. This include the
descriptives table, as well as the results for the one-way ANOVA and Tukey post hoc test only.

12
We will go through each table in turn. Decision is based on comparing the significance of the f
value with the set alpha level.

Example on One-Way ANOVA in SPSS


Suppose a researcher recruits 30 students to participate in a study. The students are randomly
assigned to use one of three studying techniques for the next month to prepare for an exam. At
the end of the month, all of the students take the same test.
The test scores for the students are shown below:
Group
1 85 86 88 75 78 94 98 79 71 80
2 91 92 93 85 87 90 94 88 95 96
3 79 78 88 94 92 85 83 85 82 81
A one-way ANOVA was used to determine if the average scores are the same across all three
groups and the following tables generated. Hence, the output tables are as follows:

The most relevant numbers include:


 N: The number of students in each group.
 Mean: The mean test score for each group.
 Std. Deviation: The standard deviation of test scores for each group.
From the table, group 1 has mean 83.40 with standard deviation of 8.435; group 2 has mean
91.10 with a standard deviation of 3.604 while group 3 has mean of 84.70 with standard
deviation of 5.293. hence mean of group 2 > group 3 > group 1

13
The most relevant numbers include:
 F: The overall F-statistic.
 Sig: The p-value that corresponds to the F-statistic (4.545) with df numerator (2) and df
denominator (27). In this case, the p-value turns out to be .020.
Recall that a one-way ANOVA uses the following null and alternative hypotheses:
 H0 (null hypothesis): μ1 = μ2 = μ3 = … = μk (all the population means are equal)
 HA (alternative hypothesis): at least one population mean is different from the rest
Since the p-value from the ANOVA table is less than .05, we have sufficient evidence to reject
the null hypothesis and conclude that at least one of the group means is different from the rest.
To find out exactly which group means differ from one another, we can refer to the last table in
the ANOVA output.

This table displays the Tukey post-hoc multiple comparisons between each of the three groups.
We are mostly interested in the Sig. column, which displays the p-values for the differences in
means between each group:
From the table we can see the p-values for the following comparisons:
 Technique 1 vs. 2: | p-value = 0.024
 Technique 1 vs. 3 | p-value = 0.883
 Technique 2 vs. 3 | p-value = 0.067

The only group comparison that has a p-value less than .05 is between technique 1 and
technique 2.
This tells us that there is a statistically significant difference in average test scores between
students who used technique 1 compared to students who used technique 2.
However, there is no statistically significant difference between technique 1 and 3, or between
technique 2 and 3.
Step 4: Report the results.
A one-way ANOVA was performed to determine if three different studying techniques lead to
different test scores.

14
A total of 10 students used each of the three studying techniques for one month before all
taking the same test.

A one-way ANOVA revealed that there was a statistically significant difference in test scores
between at least two groups (F(2, 27) = 4.545, p = 0.020).

Tukey’s test for multiple comparisons found that mean test scores were significantly different
between students who used technique 1 and technique 2 (p = .024, 95% C.I. = [-14.48, -.92]).

There was no statistically significant difference between scores for techniques 1 and 3 (p=.883)
or between scores for techniques 2 and 3 (p = .067).

Exercise
In an experiment to determine the relative efficacy of three (3) different approaches for
teaching spelling, a researcher randomly assigned 30 children who were participating in a
holiday programme to three different groups comprising of 10,12 and 8 children
respectively. Each group was exposed to each of the three approaches to spelling after
which a post-test comprising of a 10-item spelling test was administered on all the children.
The following data represent the performance of the children on the spelling test.
Table 12.1: Scores of Children Exposed to Three Different Approaches to Spelling

METHOD A METHOD B METHOD C


2 3 5 4 2 3 6 2 4
2 4 6 1 6 5 5 3 1
Q3 4 6 2 3 3 4 3
5 5 4 6

Let us suppose that the researcher is interested in testing the null hypothesis that there is
no significant difference in the mean spelling achievement of children exposed to three
approaches to spelling at 0.05 level of significance. Comment on the result

Analysis of Covariance (ANCOVA)


Introduction

The one-way (analysis of covariance) can be thought of as an extension of the one-way


ANOVA to incorporate a covariate. Like the one-way ANOVA, the one-way ANCOVA is used to
determine whether there are any significant differences between two or more independent

15
(unrelated) groups on a dependent variable. However, whereas the ANOVA looks for
differences in the group means, the ANCOVA looks for differences in adjusted means (i.e.,
adjusted for the covariate). As such, compared to the one-way ANOVA, the one-way ANCOVA
has the additional benefit of allowing you to "statistically control" for a third variable
(sometimes known as a "confounding variable"), which you believe will affect your results. This
third variable that could be confounding your results is called the covariate and you include it in
your one-way ANCOVA analysis.

o Illustration: Researchers wanted to investigate the effect of three different types of


exercise intervention on systolic blood pressure. To do this, they recruited 60
participants to their study. They randomly allocated 20 participants to each of three
interventions: a "low-intensity exercise intervention", a "moderate-intensity exercise
intervention" and a "high-intensity exercise intervention". The exercise in all
interventions burned the same number of calories. Each participant had his or her
"systolic blood pressure" measured before the intervention and immediately after the
intervention. The researcher wanted to know if the different exercise interventions had
different effects on systolic blood pressure. To answer this question, the researchers
wanted to determine whether there were any differences in mean systolic blood
pressure after the exercise interventions (i.e., whether post-intervention mean systolic
blood pressure different between the different interventions). However, the researchers
expected that the impact of the three different exercise interventions on mean systolic
blood pressure would be affected by the participants' starting systolic blood pressure
(i.e., their systolic blood pressure before the interventions). To control the post-
intervention systolic blood pressure for the differences in pre-intervention systolic blood
pressure, you can run a one-way ANCOVA with pre-intervention systolic blood pressure
as the covariate, intervention as the independent variable and post-intervention systolic
blood pressure as the dependent variable. If you find a statistically significant difference
between interventions, you can follow up a one-way ANCOVA with a post hoc test to
determine which specific exercise interventions differed in terms of their effect on
systolic blood pressure (e.g., whether the high-intensity exercise intervention had a
greater effect on systolic blood pressure than the low-intensity exercise intervention).

Assumptions

When you choose to analyse your data using a one-way ANCOVA, part of the process involves
checking to make sure that the data you want to analyse can actually be analysed using a one-

16
way ANCOVA. You need to do this because it is only appropriate to use a one-way ANCOVA if
your data "passes" nine assumptions that are required for a one-way ANCOVA to give you a
valid result. In practice, checking for these nine assumptions just adds a little bit more time to
your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your
analysis, as well as think a little bit more about your data, but it is not a difficult task.

Before we introduce these nine assumptions, do not be surprised if, when analysing your own
data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is
not uncommon when working with real-world data rather than textbook examples, which often
only show you how to carry out a one-way ANCOVA when everything goes well! However, don’t
worry. Even when your data fails certain assumptions, there is often a solution to overcome
this. First, let’s take a look at these nine assumptions:

o Assumption #1: Your dependent variable and covariate variable(s) should be measured on
a continuous scale (i.e., they are measured at the interval or ratio level). Examples of variables
that meet this criterion include revision time (measured in hours), intelligence (measured using
IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth.
As stated earlier, you can have categorical covariates (e.g., a categorical variables such as
"gender", which has two categories: "males" and "females"), but the analysis is not usually
referred to as an ANCOVA in this situation.
o Assumption #2: Your independent variable should consist of two or more
categorical, independent groups. Example independent variables that meet this criterion
include gender (e.g., two groups: male and female), ethnicity (e.g., three groups: Caucasian,
African American and Hispanic), physical activity level (e.g., four groups: sedentary, low,
moderate and high), profession (e.g., five groups: surgeon, doctor, nurse, dentist, therapist),
and so forth.
o Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves. For
example, there must be different participants in each group with no participant being in more
than one group. This is more of a study design issue than something you can test for, but it is an
important assumption of a one-way ANCOVA. If your study fails this assumption, you will need
to use another statistical test instead of a one-way ANCOVA (e.g., a repeated measures design).
Assumption #4: There should be no significant outliers. Outliers are simply data points within
your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where
the mean score was 108 with only a small variation between students, one student had a score
of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The

17
problem with outliers is that they can have a negative effect on the one-way ANCOVA, reducing
the validity of your results. Fortunately, when using SPSS Statistics to run a one-way ANCOVA
on your data, you can easily detect possible outliers.
o Assumption #5: Your residuals should be approximately normally distributed for each
category of the independent variable. We talk about the ANCOVA only
requiring approximately normal residuals because it is quite "robust" to violations of normality,
meaning that the assumption can be violated to a degree and still provide valid results. You can
test for normality using two Shapiro-Wilk tests of normality: one to test the within-group
residuals and one to test the overall model fit. Both of these are easily tested for using SPSS
Statistics.
o Assumption #6: There needs to be homogeneity of variances. You can test this assumption in
SPSS Statistics using Levene's test for homogeneity of variances.
o Assumption #7: The covariate should be linearly related to the dependent variable at each
level of the independent variable. You can test this assumption in SPSS Statistics by plotting a
grouped scatterplot of the covariate, post-test scores of the dependent variable and
independent variable.
o Assumption #8: There needs to be homoscedasticity. You can test this assumption in SPSS
Statistics by plotting a scatterplot of the standardized residuals against the predicted values.
o Assumption #9: There needs to be homogeneity of regression slopes, which means that there
is no interaction between the covariate and the independent variable. By default, SPSS
Statistics does not include an interaction term between a covariate and an independent in its
GLM procedure so that you can test this. You can check assumptions #4, #5, #6, #7, #8 and #9 using
SPSS Statistics. Before doing this, you should make sure that your data meets assumptions #1, #2 and
#3, although you don't need SPSS Statistics to do this. Remember that if you do not run the statistical
tests on these assumptions correctly, the results you get when running a one-way ANCOVA might not be
valid. In the section, Test Procedure in SPSS Statistics, we illustrate the SPSS Statistics procedure to
perform a one-way ANCOVA, assuming that no assumptions have been violated. First, we set out the
example we use to explain the one-way ANCOVA procedure in SPSS Statistics

NOTE: The first step in any statistical analysis is to state the hypothesis. For ANCOVA, the hypotheses
are:

 H0 (null hypothesis): μ1 = μ2 = μ3 = … = μk (all the population means are equal)


 HA (alternative hypothesis): at least one population mean is different from the rest

18
The hypothesis is based on adjusted means (estimated marginal means). Hence, the answer to the
research questions and conclusion of the hypothesis is based on the adjusted means presented on the
table of estimates

Example

A researcher was interested in determining whether a six-week low- or high-intensity exercise-


training programme was best at reducing blood cholesterol concentrations in middle-aged men.
Both exercise programmes were designed so that the same number of calories was expended in
the low- and high-intensity groups. As such, the duration of exercise differed between groups.
The researcher expected that any reduction in cholesterol concentration elicited by the
interventions would also depend on the participant's initial cholesterol concentration. As such,
the researcher wanted to use pre-intervention cholesterol concentration as a covariate when
comparing the post-intervention cholesterol concentrations between the interventions and a
control group. Therefore, the researcher ran a one-way ANCOVA with: (a) post-intervention
cholesterol concentration ( post ) as the dependent variable; (b) the control and two
intervention groups as levels of the independent variable, group ; and (c) the pre-intervention
cholesterol concentrations as the covariate, pre .

One-way ANCOVA in SPSS Statistic

SPSS Statistics output of the one-way ANCOVA

SPSS Statistics generates quite a few tables in its one-way ANCOVA analysis. In this section, we
show you only the main tables required to understand your results from the one-way ANCOVA
and the multiple comparisons

We explain the descriptive table, as well as the results for the one-way ANCOVA , estimates
and multiple comparisons. We go through each table in turn:

SPSS Statistics
Descriptive statistics

The Descriptive Statistics table (shown below) presents descriptive statistics (mean, standard
deviation, number of participants) on the dependent variable, post , for the different levels of
the independent variable, group . These values do not include any adjustments made by the
use of a covariate in the analysis.

19
One-way ANCOVA results

The main section of the results is presented in the Tests of Between-Subjects Effects table, as
shown below:

This table informs you whether the different interventions were statistically significantly
different having adjusted for your covariate. Put another way, whether there was an overall
statistically significant difference in post-intervention cholesterol concentration ( post )
between the different interventions ( group ) once their means had been adjusted for pre-
intervention cholesterol concentrations ( pre ). This is highlighted below:

20
In order to interpret the results, read along the group row until you reach the "Sig." column.
This provides the statistical significance value (i.e., p-value) of whether there are statistically
significant differences in post-intervention systolic blood pressure (i.e., the dependent variable)
between the groups (i.e., the independent variable) when adjusted for pre-intervention systolic
blood pressure (i.e., the covariate). In this example, you can see that there is a statistically
significant difference between adjusted means (p < .05).

Estimates
One role of covariates is to adjust posttest means for any differences among the corresponding
pretest means. These adjusted means and their standard errors are found in the Estimated
Marginal Means table shown in estimates below. Hence to get a better understanding of how
the covariate has adjusted the original post group means, you can consult the Estimates table,
as shown below:

21
Notice how the mean values have changed compared to those found in the Descriptive
Statistics table above. These new values represent the adjusted means (i.e., the original means
adjusted for the covariate).

Post hoc test

Now that you know there is a statistically significant difference between the adjusted means,
you will want to know where the differences lie. This is reported in the Pairwise
Comparisons table, as shown below:

22
.

By consulting the significance values (i.e., the "Sig." column), you can see which group
comparisons are statistically significantly different. You can report these results in a similar
manner to the one-way ANOVA, but substituting in adjusted means rather than original means.

What's interesting about this table is that the posttest means are hardly adjusted by including
our covariate. However, the covariate greatly reduces the standard errors for these means.
This is why the mean differences are statistically significant only when the covariate is
included. The adjusted descriptive are obtained from the final ANCOVA results.

Solution to the exercise above


Ho : μ1 = μ2 = μ3 (There is no significant difference among the adjusted means of the three
groups)
Ha : µ1 ≠ µ2 ≠ µ3 (There is significant difference among the adjusted means of the three groups)
From table of estimates, the mean of control group is 5.988; mean of intervention group 1 (low
intensity exercise) is 5.794 and mean of intervention group 2 (high intensity exercise) is 5.404.
This suggests that high intensity exercise is more effective in reducing blood cholesterol. To test
the hypothesis, we compare the significance of F in table of “Tests of between subject Effects”
with set alpha level. F value of 105.512 has sig of .000. Since .000<.05, we reject the null
hypothesis and conclude that the exercise interventions significantly reduced blood cholesterol
with high intensity exercise presenting as more effective
From table of pairwise Comparisons, using Bonferroni, the mean difference of control and Int_1
is .194 with sig of .000; the mean difference of control and Int_2 is .584 with sig .000 while the

23
mean difference of Int_1 and Int_2 is .390 with sig of .000. The result shows that the difference
between the pairs is significant.

Exercise

A researcher wishes to test the efficacy of three teaching methods using three intact classes.
Analysis of data collected from pretest and post test scores of students yielded the results shown
below.
a. state an appropriate hypothesis
b. test the hypothesis at .05 significance level
c. comment on your result
d. Compare significance in performance between pairs of the groups

ANCOVA
Descriptive Statistics
Dependent Variable: achievement posttest

treatment groups Mean Std. Deviation N

1.00 54.9600 16.33116 25


2.00 63.5200 9.92522 25
3.00 64.3333 11.73314 15
Total 60.4154 13.64035 65

Levene's Test of Equality of Error Variancesa


Dependent Variable: achievement posttest

F df1 df2 Sig.

1.120 2 62 .333

Tests the null hypothesis that the error variance


of the dependent variable is equal across groups.
a. Design: Intercept + allancovapretest +
allancovagrps

Tests of Between-Subjects Effects


Dependent Variable: achievement posttest

Source Type III Sum of df Mean Square F Sig.


Squares

Corrected Model 3272.279a 3 1090.760 7.705 .000


Intercept 811.437 1 811.437 5.732 .020

24
allancovapretest 2057.027 1 2057.027 14.531 .000
Allancovagrps 1413.730 2 706.865 4.993 .010
Error 8635.506 61 141.566
Total 249159.000 65
Corrected Total 11907.785 64

a. R Squared = .275 (Adjusted R Squared = .239)

Pairwise Comparisons
Dependent Variable: achievement posttest

(I) treatment groups (J) treatment groups Mean Difference Std. Error Sig.b 95% Confidence Interval for
(I-J) Differenceb

Lower Bound Upper Bound

2.00 -8.094 3.368 .058 -16.384 .196


1.00
3.00 -11.276* 3.918 .017 -20.921 -1.631
1.00 8.094 3.368 .058 -.196 16.384
2.00
3.00 -3.182 3.935 1.000 -12.869 6.506
*
1.00 11.276 3.918 .017 1.631 20.921
3.00
2.00 3.182 3.935 1.000 -6.506 12.869

Based on estimated marginal means


*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Bonferroni.

Mean Separation/Multiple Comparison/Pairwise Comparisons/Post hoc test

When significant difference among means in ANOVA /ANCOVA is found, it is necessary to


conduct multiple comparisons tests (MCTs) or post hoc test to determine where the difference
lie. There are several methods for performing MCT, such as the Tukey method, Newman-Keuls
method, Bonferroni method, Dunnett method, Scheffé’s test, and so on.

Go to:

25
Tukey method

This test uses pairwise post-hoc testing to determine whether there is a difference between the
mean of all possible pairs using a studentized range distribution. This method tests every
possible pair of all groups. Initially, the Tukey test was called the ‘Honestly significant
difference’ test, or simply the ‘T test,’4) because this method was based on the t-distribution. It
is noted that the Tukey test is based on the same sample counts between groups (balanced
data) as ANOVA. Subsequently, Kramer modified this method to apply it on unbalanced data,
and it became known as the Tukey-Kramer test. This method uses the harmonic mean of the
cell size of the two comparisons. The statistical assumptions of ANOVA should be applied to the
Tukey method, as well.5)

Fig. 2 depicts the example results of one-way ANOVA and Tukey test for multiple comparisons.
According to this figure, the Tukey test is performed with one critical level, as described earlier,
and the results of all pairwise comparisons are presented in one table under the section ‘post-
hoc test.’ The results conclude that groups A and B are different, whereas groups A and C are
not different and groups B and C are also not different.

26
Bonferroni method: ɑ splitting (Dunn’s method)

The Bonferroni method can be used to compare different groups at the baseline, study the
relationship between variables, or examine one or more endpoints in clinical trials. It is applied
as a post-hoc test in many statistical procedures such as ANOVA and its variants, including
analysis of covariance (ANCOVA) and multivariate ANOVA (MANOVA); multiple t-tests; and
Pearson’s correlation analysis. It is also used in several nonparametric tests, including the
Mann-Whitney U test, Wilcoxon signed rank test, and Kruskal-Wallis test by ranks , and as a test
for categorical data, such as Chi-squared test. When used as a post hoc test after ANOVA, the

27
Bonferroni method uses thresholds based on the t-distribution; the Bonferroni method is more
rigorous than the Tukey test, which tolerates type I errors, and more generous than the very
conservative Scheffé’s method.

However, it has disadvantages, as well, since it is unnecessarily conservative (with weak


statistical power). The adjusted α is often smaller than required, particularly if there are many
tests and/or the test statistics are positively correlated.

Pairwise Comparisons
Dependent Variable: achievement posttest

(I) treatment groups (J) treatment groups Mean Difference Std. Error Sig.b 95% Confidence Interval for
(I-J) Differenceb

Lower Bound Upper Bound

2.00 -8.094 3.368 .058 -16.384 .196


1.00
*
3.00 -11.276 3.918 .017 -20.921 -1.631
1.00 8.094 3.368 .058 -.196 16.384
2.00
3.00 -3.182 3.935 1.000 -12.869 6.506
*
1.00 11.276 3.918 .017 1.631 20.921
3.00
2.00 3.182 3.935 1.000 -6.506 12.869

Based on estimated marginal means


*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Bonferroni.

28
Dunnett method

This is a particularly useful method to analyze studies having control groups, based on
modified t-test statistics (Dunnett’s t-distribution). It is a powerful statistic and, therefore, can
discover relatively small but significant differences among groups or combinations of groups.
The Dunnett test is used by researchers interested in testing two or more experimental groups
against a single control group. However, the Dunnett test has the disadvantage that it does not
compare the groups other than the control group among themselves at all.

As an example, suppose there are three experimental groups A, B, and C, in which an


experimental drug is used, and a control group in a study. In the Dunnett test, a comparison of
control group with A, B, C, or their combinations is performed; however, no comparison is
made between the experimental groups A, B, and C. Therefore, the power of the test is higher
because the number of tests is reduced compared to the ‘all pairwise comparison.’

On the other hand, the Dunnett method is capable of ‘twotailed’ or ‘one-tailed’ testing, which
makes it different from other pairwise comparison methods. For example, if the effect of a new
drug is not known at all, the two-tailed test should be used to confirm whether the effect of the
new drug is better or worse than that of a conventional control. Subsequently, a one-sided test
is required to compare the new drug and control. Since the two-sided or single-sided test can
be performed according to the situation, the Dunnett method can be used without any
restrictions.

Scheffé’s method: exploratory post-hoc method

Scheffé’s method is not a simple pairwise comparison test. Based on F-distribution, it is a


method for performing simultaneous, joint pairwise comparisons for all possible pairwise
combinations of each group mean [6]. It controls FWER after considering every possible
pairwise combination, whereas the Tukey test controls the FWER when only all pairwise
comparisons are made.7) This is why the Scheffé’s method is very conservative than other
methods and has small power to detect the differences. Since Scheffé’s method generates

29
hypotheses based on all possible comparisons to confirm significance, this method is preferred
when theoretical background for differences between groups is unavailable or previous studies
have not been completely implemented (exploratory data analysis). The hypotheses generated
in this manner should be tested by subsequent studies that are specifically designed to test new
hypotheses. This is important in exploratory data analysis or the theoretic testing process (e.g.,
if a type I error is likely to occur in this type of study and the differences should be identified in
subsequent studies). Follow-up studies testing specific subgroup contrasts discovered through
the application of Scheffé’s method should use. Bonferroni methods that are appropriate for
theoretical test studies. It is further noted that Bonferroni methods are less sensitive to type I
errors than Scheffé’s method. Finally, Scheffé’s method enables simple or complex averaging
comparisons in both balanced and unbalanced data.

REGRESSION ANALYSIS

Regression analysis is a statistical technique that is correlation based and is used for
predictions. It is a tool that is used for predicting one variable from one or more variables based
on the correlation between the variables. It is also a technique for modeling the relationship
between variables. Once there is a relationship established by means of correlation between
any two variables X and Y; then regression analysis can be used to:

a. Predict the dependent variable from knowledge of the independent variable(s).


b. Examine the nature of the relation between the independent variable and the
dependent variable.
For instance, a teacher may be interested in studying the extent to which study habit can
predict academic achievement of students or how test anxiety can predict or account for
mathematics achievement.
Regression analysis leads to the development of a structural model or equation called
regression model or equation or formula, for predicting a variable from one or more variables.
The variable to be predicted is called the criterion variable or dependent variable whereas the
variable used in making the prediction is called the predictor variable or independent variable.
When one predictor variable is used to predict one criterion variable, it is referred to as simple
linear regression. On the other hand, where the prediction involves two or more independent
variables, and one criterion variable, it is referred to as multiple linear regressions. Therefore,
simple linear regression involves one dependent variable (Y) and one independent variable (X)
whereas multiple linear regression involves one dependent variable Y and two or more
independent variables X1, X2, X3, ……… Xn.

30
Regression analysis answers such questions as:
a. What proportion of the dependent variable is due to or predicted by the independent
variable?
b. What is the equation of the best line that fits the data representing the relationship
between the independent and dependent variables? Or what structural model /
equation can be used in predicting or estimating values of the dependent variable given
values of the independent variable?
c. How accurate would a prediction or an estimation made based on the observed
relationship between the independent or dependent variable be?
Equation or Model for Simple Linear Regression
Suppose X and Y have a linear relationship, and the values of X are plotted against
corresponding values of Y, the points will be scattered forming the shape of an ellipse. So the
problem is to find the best possible linear rule for predicting from these data and then to
evaluate the goodness of such a rule. Such a line is called “Line of best fit” or regression of Y on
X or simply regression line. The model or formula for simple regression is
Y = a + bx (a and b are constants)
where, Y = dependent variable (criterion), x = independent variable, (predictor)
a = intercept on Y – axis or regression constant. Therefore, when x = 0, Y is equal to “a”.
b = slope or gradient of the line, regression coefficient or weight. Hence for every unit change
or increase in X, Y changes or increases by “b”.

Assumptions of Regression Analysis: The four basic assumptions of regression analysis are:
1. Linear relationship: there exist a linear relationship between the independent variable, x
and the dependent variable y.
2. Normality: For each value of x, y is normally distributed. That is, y follows a normal
distribution.
3. Homoscedasticity: the residuals have constant variance at every level of x. For each x,
the variance of y, given x is the same.
4. Independence: Observations are independent of each other.

Example 1: Table below show the scores of 12 students on study habit inventory and
achievement in Mathematics.
Table ………

X 25 16 17 29 23 17 22 36 30 18 14 22
(SHI)

31
Y 80 60 70 55 62 37 60 75 50 48 42 64
(MA)
(a) Deduce the model for predicting Mathematics achievement from study habit
inventory and explain the result.
(b) What percentage of the students’ mathematics achievement can be explained by
their study habit
Model for Multiple Regressions
When there is one dependent variable and two or more independent variables, multiple
regression is the most appropriate to use. Hence a dependent variable Y is usually affected by a
number of quantifiable independent variables. The joint relationship between the variables in
multiple regression is denoted by a correlation coefficient R. The model for multiple
regressions shows the relationship between the dependent variable and the independent
variables: thus:
Y = a + b1x1 + b2x2 + b3x3 +…….+ bnxn
Where, Y = dependent variable,
a = regression constant or intercept on Y axis,
b1 x1 + b2x2 + b3x3 +…….+ bnxn = independent variables and their coefficients.
The value of each of the bI shows the weight or contribution of the independent variables in
predicting the dependent variable. Estimating the parameters a, b 1, b2….bn can easily be done
using SPSS or other statistical packages.
Standard Error of Estimate: The error in prediction is the difference between the predicted
and actual values of Y (ie Y - Y ). The Standard Error of Estimate is the measure of variation of
an observation made around the computed regression line. Simply, it is used to check the
accuracy of predictions made with the regression line. The standard error of estimate of a
model fit is a measure of the precision of the model. It is the standard deviation of the
residuals. It shows how wrong one could be if he or she used the regression model to make
predictions or to estimate the dependent variable. The smaller the value of a standard error of
estimate the closer are the dots or values to the regression line and the better the estimate
based on the equation of the line. If the standard error is zero, then there is no variation
corresponding to the computed line and the correlation will be perfect relationship.

Coefficient of determination (R2)


R-squared measures the proportion of the variation in the dependent variable (Y) explained by
the independent variable(s) (X) for a linear regression model. Adjusted R-squared adjusts the
statistic based on the number of independent variables in the model. The adjusted R-squared is
a modified version of R-squared that has been adjusted for the number of predictors in the
model as in multiple regression. The adjusted R-squared increases only if the new term improves

32
the model more than would be expected by chance. It decreases when a predictor improves the
model by less than expected by chance it should be noted that high discrepancy between the
values of R squared and adjusted R2 indicates a poor fit of the model. Any addition of useless
variable to a model causes a decrease in adjusted R squared. But, for any useful variable added,
adjusted R squared will increase. Adjusted R squared will always be less than or equal to R
squared. It therefore adjusts for the number of terms in a model. The standard error of
estimate is inversely related to adjusted R-squared. Hence If you fit simple regression models to
the same sample of the same dependent variable Y with different choices of X as the
independent variable, then adjusted R-squared necessarily goes up as the standard error of
the regression goes down, and vice versa. Hence, it is important to say that the goal of any
statistical analysis is to minimize the standard error of the regression or to maximize adjusted
R-squared through the choice of X, other things being equal. It is important to state that the
standard error of the regression is the real "bottom line" in analysis as it measures the
variations in the data that are not explained by the model in real economic or physical terms.

Calculating Regression Analysis Using SPSS


Both simple and multiple linear regressions can be analyzed/determined using SPSS or
other statistical packages. Procedure for calculating regression analysis using SPSS is as
follows: After imputing the scores for the variable(s) in your SPSS window
1. Click analyze→ regression → linear, to get a linear regression dialogue box.
2. Transfer the independent variable(s) into the independent(s) box and the dependent
variable into the dependent box.
3. You need to check for the 4 assumptions discussed above: no significant outliers,
independence of observations, homoscedasticity, and normal distribution of
errors/residuals. You can do this by using the statistics and plots features, and then
selecting appropriate options within these two dialogue boxes.

4. Click on ‘Ok’ button.

This will generate results of the descriptive statistics, model summary, ANOVA, and
coefficients. Descriptive statistics had been discussed in earlier chapters of this book.
The table of model summary provides the R, R2, adjusted R2, and the standard error of
the estimate. R2 is used to determine the percentage of variance in the dependent
variable that can be explained by the independent variable. The ANOVA table provides
the f-value and its sig or probability level. This is used to measure the statistical
significance of the overall regression model, that is, whether the independent variables
are appropriate for predicting the dependent variable. If the probability or sig of f- value
is less than the set alpha level (say .05), then the model is said to be properly fitted, that
is, the independent variables are considered appropriate for predicting the dependent

33
variable. On the other hand, if the probability of the f statistics is greater than .05, the
independent variables are considered inappropriate for predicting the dependent
variable.
The table of coefficients contains the values of the regression constant and the
coefficients of the independent variables, their associated t values and significance. This
shows the contribution of each independent variable in predicting the dependent
variable and the statistical significance of the coefficients. The regression model is
conventionally written using the unstandardized coefficients. Unstandardized
coefficients indicates how much the dependent variable varies with an independent
variable when all other independent variables are held constant. However, the
standardized coefficients can also be used to write a regression model only if the scores
of each independent variable is transformed to Z score.

Example 1: Use SPSS to solve the question in example 1 for simple linear regression

Solution: The results from SPSS print out are shown below

Descriptive Statistics
Std.
Mean Deviation N
Y 58.58 12.944 12
X 22.42 6.626 12

Correlations
Y X
Pearson Y 1.000 .446
Correlation X .446 1.000
Sig. (1-tailed) Y . .073
X .073 .
N Y 12 12
X 12 12

34
Model Summaryb
Adjusted R Std. Error of Durbin-
Model R R Square Square the Estimate Watson
a
1 .446 .199 .119 12.148 1.673
a. Predictors: (Constant), X
b. Dependent Variable: Y

ANOVAa
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 367.167 1 367.167 2.488 .146b
Residual 1475.749 10 147.575
Total 1842.917 11
a. Dependent Variable: Y
b. Predictors: (Constant), X

Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta T Sig.
1 (Constant) 39.037 12.879 3.031 .013
X .872 .553 .446 1.577 .146
a. Dependent Variable: Y

Notice that the means, standard deviations, correlation coefficients, coefficient of


determination, regression coefficients obtained using SPSS are approximately equal to the ones
obtained manually. In addition, the t value and the corresponding significance or p value are
determined.

35
From the tables, mean and standard deviation of x are 22.42 and 6.626 respectively; mean and
standard deviation of y are 58.58 and 12.944 respectively; r = 0.446 with sig of .078, r squared
=.199; b = 0.872, and a =39.037

(a) The regression model is y = 39.037 + 0.872x, that is,

Mathematics achievement = 39.037 + 0.872(study habit)


Hence for every unit increase in study habit, mathematics achievement increases by ,872
(b)R squared of .199 shows that only about 19.9 % or approximately 20% of students’
achievement can be explained by their study habit. The standard error of the prediction is
12.879, which is considered very high. This implies that the variation of the values of the
predictor around the line of best fit is wide and hence renders the prediction inaccurate.
Notice that the relationship between achievement and study habit is not significant as p value
(.078) of the correlation coefficient is greater than .05. From the ANOVA table, the f value 2.488
is not significant (p = .146). Also the regression coefficient is not significant (t = 1.577, sig
= .146). Hence study habit is not a significant predictor of mathematics achievement from the
above result.

Interpreting and Reporting the Output of Multiple Regression Analysis

SPSS statistics generates three main tables of output for a multiple regression analysis. The
tables are model summary, ANOVA (statistical significance) and Coefficients. The model
summary provides the R, R2, adjusted R2, and the standard error of the estimate. These are
used to determine how well a regression model fits the data. The ANOVA table is used to
determine the statistical significance of the overall regression model, that is, whether the
model is properly fitted. The table of coefficients is used to estimate the coefficients for the
independent variables. This shows the contribution of each independent variable in predicting
the dependent variable and the statistical significance of the coefficients. The regression model
is conventionally written using the unstandardized coefficients.

Example 2: Suppose a researcher sought to predict academic performance of students using


age, intelligent quotient, study habit and gender. Multiple regression analysis done using SPSS
generated the following results as shown in tables below.
(a) Interpret the values obtained for (i) R2, (ii) adjusted R2 (iii) standard error of estimate
(b) State the regression model based on results obtained and discuss the appropriateness
of the fit.
(c) Discuss the significance of each independent variable in predicting the dependent
variable.
Table 1: Model Summary.

36
Model R R squared (R2) Adjusted R Std error of
square estimate
1 .790 .5 77 .559 5.69097
a.Predictors: Constant, age, intelligent quotient, study habit and gender

Table 2
ANOVA

Model Sum of Df Mean square F Sig


squares
Regression 4196.483 4 1049.21 32.393 .000
Residual 3076.778 95 32.387
Total 7273.261 99
a. Dependent Variable: Academic Performance
b. Predictors: (Constant,) age, intelligent quotient, study habit and gender.

Table 3:
Coefficients

Model Unstandardized Standardize T Sig 95.0%


Coefficients d Confidence
coefficients Interval for
B Std. Beta
Error
Lower Upper
bound bound
Constant 87.830 6.385 13.756 .000 75.155 100.506
Age -.165 .063 -.176 -2.633 .010 -.290 -.041
Intelligent -.385 .043 -.677 -8.877 .000 -.471 -.299
quotient
Study -.118 .032 -.252 -3.667 .000 -.182 -.054
habit
Gender 13.208 1.344 -.748 9.824 .000 10.539 15.877
c. Dependent variable: academic performance

Solution
(a) From table 1, R = 0.790. This indicates strong relationship between the variables and
possibility of a good level of prediction. The ‘R square’ of 0.577 shows that the
independent variables (Age, Intelligent quotient, Study habit and Gender) explain 57.7%
of the variability of the dependent variable (Academic Performance). The adjusted R2

37
(.559) is very close to the value of R2 (.577). This indicates a good fit for the model. The
standard error of estimate (5.69097) shows that the estimate of the dependent variable
with the regression model will be wrong by 5.69.

(b) From table 2, our interest in is the F value and its significance. F (4, 95) = 32.393, sig or p
= .000 <.05. This shows that the independent variables statistically significantly predicts
the dependent variable. Hence the independent variables are considered appropriate
for predicting the dependent variable and hence the regression model is properly fitted.

©. Table 3 (coefficients) is used to estimate the model coefficients, their t values and associated
significance. That is the contribution of each independent variable in predicting the dependent
variable.
From the table, the regression equation to predict academic performance from age, intelligent
quotient, study habit and gender, is:
Academic Performance = 87.83 – (0.165 x age) – (0.385 x intelligent quotient) – (0.118 x study
habit) + (13.208 x gender). This is written using the unstandardized coefficient.
Consider the effect of age in this example. The coefficient for age is equal to -0.165. This means
that for every one unit increase in age, there is a decrease of 0.165 in academic performance.
Also the result indicates that a unit increase in intelligent quotient causes .385 decrease in
academic performance; a unit increase in study habit leads to .118 decrease in academic
performance, while a unit increase in gender causes 13.208 increase in academic performance.
The t and significant columns of the tables of coefficients are used to determine the statistical
significance of the independent variables. If P is less than the set alpha level (say .05), you can
conclude that the coefficient which indicate the contribution of each independent variable in
predicting the dependent variable is statistically significant. From the table, age has a t-valueof-
2.633 and significance of .010, since .010 < .05, then the contribution of age in predicting
performance is statistically significant. Similarly all the other independent variables indicated as
significant predictors of academic performance as their significant level or p values are all less
than .05.
The above result can be summarized as follows: A multiple regression was run to predict
Academic performance from age, intelligent quotient, study habit and gender. These
independent variables statistically significantly predicted academic performance, F (4, 95) =
32.393, p < .05, R2 = .577. All four variables added statistically significantly to the prediction, p <
.05

Revision Questions

38
1. The following data were generated by Dr John in a research of students score in
Entrance examination and their achievement score in mathematics

S/N Entrance Score X Achievement Score (Y)


1 30 66
2 24 50
3 27 69
4 33 73
5 31 60
6 27 59
7 20 54
8 15 60
9 12 58
10 24 56

a(i). write the regression model for predicting Y from X. (interpret the model).
(ii) what is the standard error for such estimation
(iii) what percentage of mathematics achievement can be explained by entrance score.
b. what are the basic assumptions for conducting regression analysis.
c. write a generalized model for multiple linear regression.
d. explain the meaning and implication of standard error of estimate of a regression
equation.

2. Table below shows the scores of 14 students on test anxiety inventory and achievement
in mathematics

X 26 29 20 22 36 30 18 14 22 24 30 32 16 12
(SHI)
Y 67 85 50 60 75 50 48 42 64 78 80 80 65 40
(MA)

a. I) deduce the regression model for predicting mathematics achievement from test
anxiety inventory and
interpret your result.
b. I) Write the regression equation for forecasting test anxiety from mathematics
achievement. (explain your result in each case)
ii) what percentage of mathematics achievement can be attributed to test anxiety?
39
3(a) State 2 uses of regression analysis
(a) State and explain the model for simple linear regression
(b) State and explain the model for multiple regression
4(a) Explain the concept of “Standard Error of Estimate”.
(b)State 4 assumptions of regression analysis
(c) Outline the procedure for calculating regression analysis using SPSS.

(5).Suppose a researcher sought to predict academic performance of students using self-


efficacy, anxiety, interest, and attitude. Multiple regression analysis done using SPSS generated
the following results as shown in tables below.

Model Summaryb
Model R R Square Adjusted R Std. Error of Durbin-
Square the Estimate Watson
1 .826a .683 .680 10.65919 1.920
a. Predictors: (Constant), selfefficacy, anxiety, interest, attitude
b. Dependent Variable: ACHVT

ANOVAa
Model Sum of Df Mean F Sig.
Squares Square
Regression 96448.032 4 24112.008 212.219 .000b
1 Residual 44765.642 394 113.618
Total 141213.674 398
a. Dependent Variable: ACHVT
b. Predictors: (Constant), SELFEFFICACY, ANXIETY, INTEREST, ATTITUDE

Coefficientsa
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
1 (Constant) -22.107 7.963 -2.776 .006

40
ANXIETY .447 .087 .160 5.115 .000
ATTITUDE 1.297 .079 .691 16.335 .000
INTEREST -.292 .066 -.146 -4.394 .000
SELFEFFICACY -.258 .058 -.168 -4.411 .000
a. Dependent Variable: ACHVT

(a) Interpret the values obtained for (i) R2, (ii) adjusted R2 (iii) standard error of estimate
(b) State the regression model based on results obtained and discuss the appropriateness
of the fit.
(c) Discuss the significance of each independent variable in predicting the dependent
variable.

41

You might also like