EDU 801 Lecture Note Summarized For 2025
EDU 801 Lecture Note Summarized For 2025
TOPICS;
1. Distributions of comparing two treatment t-tests
2. Comparing several treatments – ANOVA and ANCOVA for different designs
3. Mean separation tests
4. Simple and multiple regression
1
Example: the pretest and posttest mathematics achievement scores of a random sample of 20
students who were exposed to emotional intelligence therapy are shown in table below. Does
the resulting scores provide sufficient evidence to show that the therapy improved
achievement of students in mathematics at α = .05
Studen 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2
t 0 1 2 3 4 5 6 7 8 9 0
Pretest 1 2 1 2 1 2 1 2 2 1 1 1 1 1 1 2 1 2 1 1
8 1 6 2 9 4 7 1 3 8 4 6 6 9 8 0 2 2 5 7
Posttes 2 2 1 2 1 2 2 2 1 2 1 1 1 2 1 2 1 2 1 1
t 2 5 7 4 6 9 0 3 9 0 5 5 8 6 8 4 8 5 9 6
Solution
Ho : µd = O (The mean of the paired difference of pretest and posttest is zero).
Ha: µd ≠ O (The mean of the paired difference of pretest and posttest is not equal to zero).
The output result from SPSS is as shown in tables below
N Correlation Sig.
Lower Upper
2
pre for paired
Pair 1 - post for -2.050 2.837 .634 -3.378 -.722 -3.231 19 .004
paired
From paired samples statistics table, sample size is 20, the mean of pretest is 18.40, with
standard deviation 3.152 and standard error mean is .705. Also, the mean of posttest is 20.45,
standard deviation of 4.058 and standard error of mean is .907
From paired samples test table, the paired differences has mean of -2.050, standard deviation
of 2.837, standard error of mean as .634. The t-value is -3.231, df =19 and sig (2-tailed) = .004
Decision: The sig or P value < alpha level (.004 < .05), hence Ho is rejected; implying the mean
of the paired difference is not equal to zero. That means that the difference in the achievement
of the students pre and post emotional intelligence therapy is significant.
Conclusion: the emotional intelligence therapy improved achievement of the students in
mathematics, since the mean of posttest (20.45) is greater than the mean of pretest (18.4).
Exercise:
The continuous assessment scores of a random sample of 10 students in mathematics and
physics are shown in table below. Determine whether the performance of the students in the 2
subjects is the same at .05 significant level.
Studen 1 2 3 4 5 6 7 8 9 10
t
Maths 75 65 68 72 56 57 63 72 59 60
Physics 78 66 69 73 63 62 60 67 76 67
Introduction
The independent t-test, also called the two-sample t-test, independent-samples t-test or
student's t-test, is an inferential statistical test that determines whether there is a statistically
significant difference between the means in two unrelated groups. Unrelated groups, also
called unpaired groups or independent groups, are groups in which the cases (e.g., participants)
in each group are different. Often we are investigating differences in individuals, which means
3
that when comparing two groups, an individual in one group cannot also be a member of the
other group and vice versa. An example would be gender - an individual would have to be
classified as either male or female; science and arts; experienced and inexperienced, control
and experimental groups – not both.
The null hypothesis for the independent t-test is that the population means from the two
unrelated groups are equal: ie there is no significant difference between the means of the two
groups
In most cases, we are looking to see if we can show that we can reject the null hypothesis and
accept the alternative hypothesis, which is that the population means are not equal: ie there is
significant difference between the means of the two groups.
HA: u1 ≠ u2
To do this, we need to set a significance level (also called alpha) that allows us to either reject
or accept the alternative hypothesis. Most commonly, this value is set at 0.05.
Independent sample t test can be calculated manually using formula, or using SPSS and other
statistical packages. However, our focus is in the use of SPSS.
We now illustrate the steps involved in carrying out a t-test of the significance of difference
between means.
Step1: First we formulate an appropriate null hypothesis under the t-
test.
Ho : There is no statistically significant difference between the mean
physics achievement scores of male and female SSone
students.
Ha : There is a statistically significant difference between the mean
physics achievement scores of male and female SS one
students.
4
Step 2: Choice of an alpha level or level of significance. Let us
Conduct this test at the 0.05 level of significance.
Step3: Run the analysis using spss
One independent, categorical variable that has two levels/groups.eg male and female;
science and arts; experienced and inexperienced, control and experimental groups
One continuous dependent variable. eg Mathematics achievement, test anxiety etc.
Assumptions
When you choose to analyse your data using an independent t-test, part of the process involves
checking to make sure that the data you want to analyse can actually be analysed using an
independent t-test. You need to do this because it is only appropriate to use an independent t-
test if your data "passes" six assumptions that are required for an independent t-test to give
you a valid result. In practice, checking for these six assumptions just adds a little bit more time
to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing
your analysis, as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these six assumptions, do not be surprised if, when analysing your
own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met).
This is not uncommon when working with real-world data rather than textbook examples,
which often only show you how to carry out an independent t-test when everything goes well!
However, don't worry. Even when your data fails certain assumptions, there is often a solution
to overcome this. First, let's take a look at these six assumptions:
5
participant being in more than one group. This is more of a study design issue than
something you can test for, but it is an important assumption of the independent t-test.
If your study fails this assumption, you will need to use another statistical test instead of
the independent t-test (e.g., a paired-samples t-test).
Assumption #4: There should be no significant outliers. Outliers are simply single data
points within your data that do not follow the usual pattern (e.g., in a study of 100
students' IQ scores, where the mean score was 108 with only a small variation between
students, one student had a score of 156, which is very unusual, and may even put her
in the top 1% of IQ scores globally). The problem with outliers is that they can have a
negative effect on the independent t-test, reducing the validity of your results.
Fortunately, when using SPSS Statistics to run an independent t-test on your data, you
can easily detect possible outliers.
Assumption #5: Your dependent variable should be approximately normally
distributed for each group of the independent variable. We talk about the independent
t-test only requiring approximately normal data because it is quite "robust" to violations
of normality, meaning that this assumption can be a little violated and still provide valid
results. You can test for normality using the Shapiro-Wilk test of normality, which is
easily tested for using SPSS Statistics.
Assumption #6: There needs to be homogeneity of variances. You can test this
assumption in SPSS Statistics using Levene’s test for homogeneity of variances. You can
check assumptions #4, #5 and #6 using SPSS Statistics. Before doing this, you should
make sure that your data meets assumptions #1, #2 and #3, although you don't need
SPSS Statistics to do this. When moving on to assumptions #4, #5 and #6, we suggest
testing them in this order because it represents an order where, if a violation to the
assumption is not correctable, you will no longer be able to use an independent t-test
(although you may be able to run another statistical test on your data instead). Just
remember that if you do not run the statistical tests on these assumptions correctly, the
results you get when running an independent t-test might not be valid.
SPSS Statistics
The steps below show you how to analyse your data using an independent t-test in SPSS
Statistics when the six assumptions have not been violated.
Click Analyze > Compare Means > Independent-Samples T Test... on the top menu,
6
4. Click the button.
If your data passed assumption #4 (i.e., there were no significant outliers), assumption #5 (i.e.,
your dependent variable was approximately normally distributed for each group of the
independent variable) and assumption #6 (i.e., there was homogeneity of variances), which we
explained earlier in the Assumptions section, you will only need to interpret these two main
tables. However, since you should have tested your data for these assumptions, you will also
need to interpret the SPSS Statistics output that was produced when you tested for them (i.e.,
you will have to interpret: (a) the boxplots you used to check if there were any significant
outliers; (b) the output SPSS Statistics produces for your Shapiro-Wilk test of normality to
determine normality; and (c) the output SPSS Statistics produces for Levene's test for
homogeneity of variances).
SPSS Statistics generates two main tables of output for the independent t-test. These are the
group statistics and independent samples test. The group statistics contains the groups, sample
size, mean, standard deviation and standard error mean whereas independent samples test
contains the group variances (equal variances assumed and equal variances not assumed),
Levene’s Test of equality of variances, and t- test equality of means, df and sig for two tail test.
Decision is based on comparing Sig with set alpha level. Reject Ho if sig < .05.
Example
The Mathematics scores of students in experimental and control groups are as shown in table
below. A researcher wants to determine if the performance of students in experimental and
control groups differ significantly.
Experimental Control
33 31
37 30
44 30
26 35
39 26
35 23
16 25
26 32
29 23
7
29 41
30 35
25 36
26 26
27 28
29 35
44 33
41 24
32 27
28 33
41 23
33 31
41 28
37 33
24 26
19 26
23 27
34 33
38 40
36 31
27 24
30 15
29 27
25 31
25 37
37 18
36 24
31 26
28 30
24 33
40 21
Solution
Ho : There is no statistically significant difference between the mean
8
Mathematics achievement scores of experimental and control groups
Ha : There is a statistically significant difference between the mean
Mathematics achievement scores of experimental and control groups
Setup in SPSS Statistics
In SPSS Statistics, we separated the groups for analysis by creating a grouping variable called
Treatment (i.e., the independent variable), and gave the "experimental group" a value of "1"
and the "control group" a value of "2" (i.e., the two groups of the independent variable).
Mathematics achievement were entered under the variable name achievement (i.e., the
dependent variable).
From the group statistics table, the experimental group has sample size = 40, mean = 31.35,
standard deviation = 6.796 and standard error of mean = 1.0745 whereas the control group has
sample size = 40, mean = 28.925, standard deviation = 5.622 and standard error of mean = .889.
This indicates that the experimental group has higher mathematics achievement. The result of the
independent samples test will show the significance or otherwise of the difference.
Lower Upper
9
Equal
variances 2.126 .149 1.739 78 .086 2.42500 1.39456 -.35135 5.20135
assumed
ACHVT FOR
Equal
TREATMENT
variances
1.739 75.356 .086 2.42500 1.39456 -.35289 5.20289
not
assumed
From the table of independent samples test, the t – value (1.739) has significance value of .086.
You can see that the group means are not statistically significantly different because the value
in the "Sig. (2-tailed)" row is greater than 0.05. (.086 > .05) Looking at the Group Statistics
table, although the experimental group has higher mathematics achievement mean score than
the control group, but the difference in the scores is not statistically significant. Hence the study
found out that the treatment did not significantly improve the achievement of the students in
mathematics.
Exercise
In a study to investigate the influence of gender on students’ achievement in physics, the
following scores were obtained from a physics achievement test administered on a sample of
26 SS one students, made up of 14 males and 12 females:
The researcher is interested in testing for the significance of gender as a factor in the
achievement of SS one students in physics.
a. State and test the appropriate hypothesis at .05 level of significance.
10
b. Comment on your result.
Before we introduce you to these six assumptions, do not be surprised if, when analysing your
own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met).
This is not uncommon when working with real-world data rather than textbook examples,
which often only show you how to carry out a one-way ANOVA when everything goes well!
However, don’t worry. Even when your data fails certain assumptions, there is often a solution
to overcome this. First, let’s take a look at these six assumptions:
o Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e.,
they are continuous). Examples of variables that meet this criterion include revision time
(measured in hours), intelligence (measured using IQ score), exam performance (measured
from 0 to 100), weight (measured in kg), and so forth.
11
o Assumption #2: Your independent variable should consist of three or more
categorical, independent groups.
o Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves. There
should be random assignment of subjects to groups.
o Assumption #4: There should be no significant outliers. Outliers are simply single data points
within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores,
where the mean score was 108 with only a small variation between students, one student had a
score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally).
The problem with outliers is that they can have a negative effect on the one-way ANOVA,
reducing the validity of your results. Fortunately, when using SPSS Statistics to run a one-way
ANOVA on your data, you can easily detect possible outliers.
o Assumption #5: Your dependent variable should be approximately normally distributed for
each category of the independent variable. We talk about the one-way ANOVA only
requiring approximately normal data because it is quite "robust" to violations of normality,
meaning that assumption can be a little violated and still provide valid results. You can test for
normality using the Shapiro-Wilk test of normality, which is easily tested for using SPSS
Statistics.
o Assumption #6: There needs to be homogeneity of variances. You can test this assumption in
SPSS Statistics using Levene's test for homogeneity of variances.
You can check assumptions #4, #5 and #6 using SPSS Statistics. Before doing this, you should
make sure that your data meets assumptions #1, #2 and #3, although you don't need SPSS
Statistics to do this. Remember that if you do not run the statistical tests on these assumptions
correctly, the results you get when running a one-way ANOVA might not be valid.
SPSS Statistics generates quite a few tables in its one-way ANOVA analysis. This include the
descriptives table, as well as the results for the one-way ANOVA and Tukey post hoc test only.
12
We will go through each table in turn. Decision is based on comparing the significance of the f
value with the set alpha level.
13
The most relevant numbers include:
F: The overall F-statistic.
Sig: The p-value that corresponds to the F-statistic (4.545) with df numerator (2) and df
denominator (27). In this case, the p-value turns out to be .020.
Recall that a one-way ANOVA uses the following null and alternative hypotheses:
H0 (null hypothesis): μ1 = μ2 = μ3 = … = μk (all the population means are equal)
HA (alternative hypothesis): at least one population mean is different from the rest
Since the p-value from the ANOVA table is less than .05, we have sufficient evidence to reject
the null hypothesis and conclude that at least one of the group means is different from the rest.
To find out exactly which group means differ from one another, we can refer to the last table in
the ANOVA output.
This table displays the Tukey post-hoc multiple comparisons between each of the three groups.
We are mostly interested in the Sig. column, which displays the p-values for the differences in
means between each group:
From the table we can see the p-values for the following comparisons:
Technique 1 vs. 2: | p-value = 0.024
Technique 1 vs. 3 | p-value = 0.883
Technique 2 vs. 3 | p-value = 0.067
The only group comparison that has a p-value less than .05 is between technique 1 and
technique 2.
This tells us that there is a statistically significant difference in average test scores between
students who used technique 1 compared to students who used technique 2.
However, there is no statistically significant difference between technique 1 and 3, or between
technique 2 and 3.
Step 4: Report the results.
A one-way ANOVA was performed to determine if three different studying techniques lead to
different test scores.
14
A total of 10 students used each of the three studying techniques for one month before all
taking the same test.
A one-way ANOVA revealed that there was a statistically significant difference in test scores
between at least two groups (F(2, 27) = 4.545, p = 0.020).
Tukey’s test for multiple comparisons found that mean test scores were significantly different
between students who used technique 1 and technique 2 (p = .024, 95% C.I. = [-14.48, -.92]).
There was no statistically significant difference between scores for techniques 1 and 3 (p=.883)
or between scores for techniques 2 and 3 (p = .067).
Exercise
In an experiment to determine the relative efficacy of three (3) different approaches for
teaching spelling, a researcher randomly assigned 30 children who were participating in a
holiday programme to three different groups comprising of 10,12 and 8 children
respectively. Each group was exposed to each of the three approaches to spelling after
which a post-test comprising of a 10-item spelling test was administered on all the children.
The following data represent the performance of the children on the spelling test.
Table 12.1: Scores of Children Exposed to Three Different Approaches to Spelling
Let us suppose that the researcher is interested in testing the null hypothesis that there is
no significant difference in the mean spelling achievement of children exposed to three
approaches to spelling at 0.05 level of significance. Comment on the result
15
(unrelated) groups on a dependent variable. However, whereas the ANOVA looks for
differences in the group means, the ANCOVA looks for differences in adjusted means (i.e.,
adjusted for the covariate). As such, compared to the one-way ANOVA, the one-way ANCOVA
has the additional benefit of allowing you to "statistically control" for a third variable
(sometimes known as a "confounding variable"), which you believe will affect your results. This
third variable that could be confounding your results is called the covariate and you include it in
your one-way ANCOVA analysis.
Assumptions
When you choose to analyse your data using a one-way ANCOVA, part of the process involves
checking to make sure that the data you want to analyse can actually be analysed using a one-
16
way ANCOVA. You need to do this because it is only appropriate to use a one-way ANCOVA if
your data "passes" nine assumptions that are required for a one-way ANCOVA to give you a
valid result. In practice, checking for these nine assumptions just adds a little bit more time to
your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your
analysis, as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce these nine assumptions, do not be surprised if, when analysing your own
data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is
not uncommon when working with real-world data rather than textbook examples, which often
only show you how to carry out a one-way ANCOVA when everything goes well! However, don’t
worry. Even when your data fails certain assumptions, there is often a solution to overcome
this. First, let’s take a look at these nine assumptions:
o Assumption #1: Your dependent variable and covariate variable(s) should be measured on
a continuous scale (i.e., they are measured at the interval or ratio level). Examples of variables
that meet this criterion include revision time (measured in hours), intelligence (measured using
IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth.
As stated earlier, you can have categorical covariates (e.g., a categorical variables such as
"gender", which has two categories: "males" and "females"), but the analysis is not usually
referred to as an ANCOVA in this situation.
o Assumption #2: Your independent variable should consist of two or more
categorical, independent groups. Example independent variables that meet this criterion
include gender (e.g., two groups: male and female), ethnicity (e.g., three groups: Caucasian,
African American and Hispanic), physical activity level (e.g., four groups: sedentary, low,
moderate and high), profession (e.g., five groups: surgeon, doctor, nurse, dentist, therapist),
and so forth.
o Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves. For
example, there must be different participants in each group with no participant being in more
than one group. This is more of a study design issue than something you can test for, but it is an
important assumption of a one-way ANCOVA. If your study fails this assumption, you will need
to use another statistical test instead of a one-way ANCOVA (e.g., a repeated measures design).
Assumption #4: There should be no significant outliers. Outliers are simply data points within
your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where
the mean score was 108 with only a small variation between students, one student had a score
of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The
17
problem with outliers is that they can have a negative effect on the one-way ANCOVA, reducing
the validity of your results. Fortunately, when using SPSS Statistics to run a one-way ANCOVA
on your data, you can easily detect possible outliers.
o Assumption #5: Your residuals should be approximately normally distributed for each
category of the independent variable. We talk about the ANCOVA only
requiring approximately normal residuals because it is quite "robust" to violations of normality,
meaning that the assumption can be violated to a degree and still provide valid results. You can
test for normality using two Shapiro-Wilk tests of normality: one to test the within-group
residuals and one to test the overall model fit. Both of these are easily tested for using SPSS
Statistics.
o Assumption #6: There needs to be homogeneity of variances. You can test this assumption in
SPSS Statistics using Levene's test for homogeneity of variances.
o Assumption #7: The covariate should be linearly related to the dependent variable at each
level of the independent variable. You can test this assumption in SPSS Statistics by plotting a
grouped scatterplot of the covariate, post-test scores of the dependent variable and
independent variable.
o Assumption #8: There needs to be homoscedasticity. You can test this assumption in SPSS
Statistics by plotting a scatterplot of the standardized residuals against the predicted values.
o Assumption #9: There needs to be homogeneity of regression slopes, which means that there
is no interaction between the covariate and the independent variable. By default, SPSS
Statistics does not include an interaction term between a covariate and an independent in its
GLM procedure so that you can test this. You can check assumptions #4, #5, #6, #7, #8 and #9 using
SPSS Statistics. Before doing this, you should make sure that your data meets assumptions #1, #2 and
#3, although you don't need SPSS Statistics to do this. Remember that if you do not run the statistical
tests on these assumptions correctly, the results you get when running a one-way ANCOVA might not be
valid. In the section, Test Procedure in SPSS Statistics, we illustrate the SPSS Statistics procedure to
perform a one-way ANCOVA, assuming that no assumptions have been violated. First, we set out the
example we use to explain the one-way ANCOVA procedure in SPSS Statistics
NOTE: The first step in any statistical analysis is to state the hypothesis. For ANCOVA, the hypotheses
are:
18
The hypothesis is based on adjusted means (estimated marginal means). Hence, the answer to the
research questions and conclusion of the hypothesis is based on the adjusted means presented on the
table of estimates
Example
SPSS Statistics generates quite a few tables in its one-way ANCOVA analysis. In this section, we
show you only the main tables required to understand your results from the one-way ANCOVA
and the multiple comparisons
We explain the descriptive table, as well as the results for the one-way ANCOVA , estimates
and multiple comparisons. We go through each table in turn:
SPSS Statistics
Descriptive statistics
The Descriptive Statistics table (shown below) presents descriptive statistics (mean, standard
deviation, number of participants) on the dependent variable, post , for the different levels of
the independent variable, group . These values do not include any adjustments made by the
use of a covariate in the analysis.
19
One-way ANCOVA results
The main section of the results is presented in the Tests of Between-Subjects Effects table, as
shown below:
This table informs you whether the different interventions were statistically significantly
different having adjusted for your covariate. Put another way, whether there was an overall
statistically significant difference in post-intervention cholesterol concentration ( post )
between the different interventions ( group ) once their means had been adjusted for pre-
intervention cholesterol concentrations ( pre ). This is highlighted below:
20
In order to interpret the results, read along the group row until you reach the "Sig." column.
This provides the statistical significance value (i.e., p-value) of whether there are statistically
significant differences in post-intervention systolic blood pressure (i.e., the dependent variable)
between the groups (i.e., the independent variable) when adjusted for pre-intervention systolic
blood pressure (i.e., the covariate). In this example, you can see that there is a statistically
significant difference between adjusted means (p < .05).
Estimates
One role of covariates is to adjust posttest means for any differences among the corresponding
pretest means. These adjusted means and their standard errors are found in the Estimated
Marginal Means table shown in estimates below. Hence to get a better understanding of how
the covariate has adjusted the original post group means, you can consult the Estimates table,
as shown below:
21
Notice how the mean values have changed compared to those found in the Descriptive
Statistics table above. These new values represent the adjusted means (i.e., the original means
adjusted for the covariate).
Now that you know there is a statistically significant difference between the adjusted means,
you will want to know where the differences lie. This is reported in the Pairwise
Comparisons table, as shown below:
22
.
By consulting the significance values (i.e., the "Sig." column), you can see which group
comparisons are statistically significantly different. You can report these results in a similar
manner to the one-way ANOVA, but substituting in adjusted means rather than original means.
What's interesting about this table is that the posttest means are hardly adjusted by including
our covariate. However, the covariate greatly reduces the standard errors for these means.
This is why the mean differences are statistically significant only when the covariate is
included. The adjusted descriptive are obtained from the final ANCOVA results.
23
mean difference of Int_1 and Int_2 is .390 with sig of .000. The result shows that the difference
between the pairs is significant.
Exercise
A researcher wishes to test the efficacy of three teaching methods using three intact classes.
Analysis of data collected from pretest and post test scores of students yielded the results shown
below.
a. state an appropriate hypothesis
b. test the hypothesis at .05 significance level
c. comment on your result
d. Compare significance in performance between pairs of the groups
ANCOVA
Descriptive Statistics
Dependent Variable: achievement posttest
1.120 2 62 .333
24
allancovapretest 2057.027 1 2057.027 14.531 .000
Allancovagrps 1413.730 2 706.865 4.993 .010
Error 8635.506 61 141.566
Total 249159.000 65
Corrected Total 11907.785 64
Pairwise Comparisons
Dependent Variable: achievement posttest
(I) treatment groups (J) treatment groups Mean Difference Std. Error Sig.b 95% Confidence Interval for
(I-J) Differenceb
Go to:
25
Tukey method
This test uses pairwise post-hoc testing to determine whether there is a difference between the
mean of all possible pairs using a studentized range distribution. This method tests every
possible pair of all groups. Initially, the Tukey test was called the ‘Honestly significant
difference’ test, or simply the ‘T test,’4) because this method was based on the t-distribution. It
is noted that the Tukey test is based on the same sample counts between groups (balanced
data) as ANOVA. Subsequently, Kramer modified this method to apply it on unbalanced data,
and it became known as the Tukey-Kramer test. This method uses the harmonic mean of the
cell size of the two comparisons. The statistical assumptions of ANOVA should be applied to the
Tukey method, as well.5)
Fig. 2 depicts the example results of one-way ANOVA and Tukey test for multiple comparisons.
According to this figure, the Tukey test is performed with one critical level, as described earlier,
and the results of all pairwise comparisons are presented in one table under the section ‘post-
hoc test.’ The results conclude that groups A and B are different, whereas groups A and C are
not different and groups B and C are also not different.
26
Bonferroni method: ɑ splitting (Dunn’s method)
The Bonferroni method can be used to compare different groups at the baseline, study the
relationship between variables, or examine one or more endpoints in clinical trials. It is applied
as a post-hoc test in many statistical procedures such as ANOVA and its variants, including
analysis of covariance (ANCOVA) and multivariate ANOVA (MANOVA); multiple t-tests; and
Pearson’s correlation analysis. It is also used in several nonparametric tests, including the
Mann-Whitney U test, Wilcoxon signed rank test, and Kruskal-Wallis test by ranks , and as a test
for categorical data, such as Chi-squared test. When used as a post hoc test after ANOVA, the
27
Bonferroni method uses thresholds based on the t-distribution; the Bonferroni method is more
rigorous than the Tukey test, which tolerates type I errors, and more generous than the very
conservative Scheffé’s method.
Pairwise Comparisons
Dependent Variable: achievement posttest
(I) treatment groups (J) treatment groups Mean Difference Std. Error Sig.b 95% Confidence Interval for
(I-J) Differenceb
28
Dunnett method
This is a particularly useful method to analyze studies having control groups, based on
modified t-test statistics (Dunnett’s t-distribution). It is a powerful statistic and, therefore, can
discover relatively small but significant differences among groups or combinations of groups.
The Dunnett test is used by researchers interested in testing two or more experimental groups
against a single control group. However, the Dunnett test has the disadvantage that it does not
compare the groups other than the control group among themselves at all.
On the other hand, the Dunnett method is capable of ‘twotailed’ or ‘one-tailed’ testing, which
makes it different from other pairwise comparison methods. For example, if the effect of a new
drug is not known at all, the two-tailed test should be used to confirm whether the effect of the
new drug is better or worse than that of a conventional control. Subsequently, a one-sided test
is required to compare the new drug and control. Since the two-sided or single-sided test can
be performed according to the situation, the Dunnett method can be used without any
restrictions.
29
hypotheses based on all possible comparisons to confirm significance, this method is preferred
when theoretical background for differences between groups is unavailable or previous studies
have not been completely implemented (exploratory data analysis). The hypotheses generated
in this manner should be tested by subsequent studies that are specifically designed to test new
hypotheses. This is important in exploratory data analysis or the theoretic testing process (e.g.,
if a type I error is likely to occur in this type of study and the differences should be identified in
subsequent studies). Follow-up studies testing specific subgroup contrasts discovered through
the application of Scheffé’s method should use. Bonferroni methods that are appropriate for
theoretical test studies. It is further noted that Bonferroni methods are less sensitive to type I
errors than Scheffé’s method. Finally, Scheffé’s method enables simple or complex averaging
comparisons in both balanced and unbalanced data.
REGRESSION ANALYSIS
Regression analysis is a statistical technique that is correlation based and is used for
predictions. It is a tool that is used for predicting one variable from one or more variables based
on the correlation between the variables. It is also a technique for modeling the relationship
between variables. Once there is a relationship established by means of correlation between
any two variables X and Y; then regression analysis can be used to:
30
Regression analysis answers such questions as:
a. What proportion of the dependent variable is due to or predicted by the independent
variable?
b. What is the equation of the best line that fits the data representing the relationship
between the independent and dependent variables? Or what structural model /
equation can be used in predicting or estimating values of the dependent variable given
values of the independent variable?
c. How accurate would a prediction or an estimation made based on the observed
relationship between the independent or dependent variable be?
Equation or Model for Simple Linear Regression
Suppose X and Y have a linear relationship, and the values of X are plotted against
corresponding values of Y, the points will be scattered forming the shape of an ellipse. So the
problem is to find the best possible linear rule for predicting from these data and then to
evaluate the goodness of such a rule. Such a line is called “Line of best fit” or regression of Y on
X or simply regression line. The model or formula for simple regression is
Y = a + bx (a and b are constants)
where, Y = dependent variable (criterion), x = independent variable, (predictor)
a = intercept on Y – axis or regression constant. Therefore, when x = 0, Y is equal to “a”.
b = slope or gradient of the line, regression coefficient or weight. Hence for every unit change
or increase in X, Y changes or increases by “b”.
Assumptions of Regression Analysis: The four basic assumptions of regression analysis are:
1. Linear relationship: there exist a linear relationship between the independent variable, x
and the dependent variable y.
2. Normality: For each value of x, y is normally distributed. That is, y follows a normal
distribution.
3. Homoscedasticity: the residuals have constant variance at every level of x. For each x,
the variance of y, given x is the same.
4. Independence: Observations are independent of each other.
Example 1: Table below show the scores of 12 students on study habit inventory and
achievement in Mathematics.
Table ………
X 25 16 17 29 23 17 22 36 30 18 14 22
(SHI)
31
Y 80 60 70 55 62 37 60 75 50 48 42 64
(MA)
(a) Deduce the model for predicting Mathematics achievement from study habit
inventory and explain the result.
(b) What percentage of the students’ mathematics achievement can be explained by
their study habit
Model for Multiple Regressions
When there is one dependent variable and two or more independent variables, multiple
regression is the most appropriate to use. Hence a dependent variable Y is usually affected by a
number of quantifiable independent variables. The joint relationship between the variables in
multiple regression is denoted by a correlation coefficient R. The model for multiple
regressions shows the relationship between the dependent variable and the independent
variables: thus:
Y = a + b1x1 + b2x2 + b3x3 +…….+ bnxn
Where, Y = dependent variable,
a = regression constant or intercept on Y axis,
b1 x1 + b2x2 + b3x3 +…….+ bnxn = independent variables and their coefficients.
The value of each of the bI shows the weight or contribution of the independent variables in
predicting the dependent variable. Estimating the parameters a, b 1, b2….bn can easily be done
using SPSS or other statistical packages.
Standard Error of Estimate: The error in prediction is the difference between the predicted
and actual values of Y (ie Y - Y ). The Standard Error of Estimate is the measure of variation of
an observation made around the computed regression line. Simply, it is used to check the
accuracy of predictions made with the regression line. The standard error of estimate of a
model fit is a measure of the precision of the model. It is the standard deviation of the
residuals. It shows how wrong one could be if he or she used the regression model to make
predictions or to estimate the dependent variable. The smaller the value of a standard error of
estimate the closer are the dots or values to the regression line and the better the estimate
based on the equation of the line. If the standard error is zero, then there is no variation
corresponding to the computed line and the correlation will be perfect relationship.
32
the model more than would be expected by chance. It decreases when a predictor improves the
model by less than expected by chance it should be noted that high discrepancy between the
values of R squared and adjusted R2 indicates a poor fit of the model. Any addition of useless
variable to a model causes a decrease in adjusted R squared. But, for any useful variable added,
adjusted R squared will increase. Adjusted R squared will always be less than or equal to R
squared. It therefore adjusts for the number of terms in a model. The standard error of
estimate is inversely related to adjusted R-squared. Hence If you fit simple regression models to
the same sample of the same dependent variable Y with different choices of X as the
independent variable, then adjusted R-squared necessarily goes up as the standard error of
the regression goes down, and vice versa. Hence, it is important to say that the goal of any
statistical analysis is to minimize the standard error of the regression or to maximize adjusted
R-squared through the choice of X, other things being equal. It is important to state that the
standard error of the regression is the real "bottom line" in analysis as it measures the
variations in the data that are not explained by the model in real economic or physical terms.
This will generate results of the descriptive statistics, model summary, ANOVA, and
coefficients. Descriptive statistics had been discussed in earlier chapters of this book.
The table of model summary provides the R, R2, adjusted R2, and the standard error of
the estimate. R2 is used to determine the percentage of variance in the dependent
variable that can be explained by the independent variable. The ANOVA table provides
the f-value and its sig or probability level. This is used to measure the statistical
significance of the overall regression model, that is, whether the independent variables
are appropriate for predicting the dependent variable. If the probability or sig of f- value
is less than the set alpha level (say .05), then the model is said to be properly fitted, that
is, the independent variables are considered appropriate for predicting the dependent
33
variable. On the other hand, if the probability of the f statistics is greater than .05, the
independent variables are considered inappropriate for predicting the dependent
variable.
The table of coefficients contains the values of the regression constant and the
coefficients of the independent variables, their associated t values and significance. This
shows the contribution of each independent variable in predicting the dependent
variable and the statistical significance of the coefficients. The regression model is
conventionally written using the unstandardized coefficients. Unstandardized
coefficients indicates how much the dependent variable varies with an independent
variable when all other independent variables are held constant. However, the
standardized coefficients can also be used to write a regression model only if the scores
of each independent variable is transformed to Z score.
Example 1: Use SPSS to solve the question in example 1 for simple linear regression
Solution: The results from SPSS print out are shown below
Descriptive Statistics
Std.
Mean Deviation N
Y 58.58 12.944 12
X 22.42 6.626 12
Correlations
Y X
Pearson Y 1.000 .446
Correlation X .446 1.000
Sig. (1-tailed) Y . .073
X .073 .
N Y 12 12
X 12 12
34
Model Summaryb
Adjusted R Std. Error of Durbin-
Model R R Square Square the Estimate Watson
a
1 .446 .199 .119 12.148 1.673
a. Predictors: (Constant), X
b. Dependent Variable: Y
ANOVAa
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 367.167 1 367.167 2.488 .146b
Residual 1475.749 10 147.575
Total 1842.917 11
a. Dependent Variable: Y
b. Predictors: (Constant), X
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta T Sig.
1 (Constant) 39.037 12.879 3.031 .013
X .872 .553 .446 1.577 .146
a. Dependent Variable: Y
35
From the tables, mean and standard deviation of x are 22.42 and 6.626 respectively; mean and
standard deviation of y are 58.58 and 12.944 respectively; r = 0.446 with sig of .078, r squared
=.199; b = 0.872, and a =39.037
SPSS statistics generates three main tables of output for a multiple regression analysis. The
tables are model summary, ANOVA (statistical significance) and Coefficients. The model
summary provides the R, R2, adjusted R2, and the standard error of the estimate. These are
used to determine how well a regression model fits the data. The ANOVA table is used to
determine the statistical significance of the overall regression model, that is, whether the
model is properly fitted. The table of coefficients is used to estimate the coefficients for the
independent variables. This shows the contribution of each independent variable in predicting
the dependent variable and the statistical significance of the coefficients. The regression model
is conventionally written using the unstandardized coefficients.
36
Model R R squared (R2) Adjusted R Std error of
square estimate
1 .790 .5 77 .559 5.69097
a.Predictors: Constant, age, intelligent quotient, study habit and gender
Table 2
ANOVA
Table 3:
Coefficients
Solution
(a) From table 1, R = 0.790. This indicates strong relationship between the variables and
possibility of a good level of prediction. The ‘R square’ of 0.577 shows that the
independent variables (Age, Intelligent quotient, Study habit and Gender) explain 57.7%
of the variability of the dependent variable (Academic Performance). The adjusted R2
37
(.559) is very close to the value of R2 (.577). This indicates a good fit for the model. The
standard error of estimate (5.69097) shows that the estimate of the dependent variable
with the regression model will be wrong by 5.69.
(b) From table 2, our interest in is the F value and its significance. F (4, 95) = 32.393, sig or p
= .000 <.05. This shows that the independent variables statistically significantly predicts
the dependent variable. Hence the independent variables are considered appropriate
for predicting the dependent variable and hence the regression model is properly fitted.
©. Table 3 (coefficients) is used to estimate the model coefficients, their t values and associated
significance. That is the contribution of each independent variable in predicting the dependent
variable.
From the table, the regression equation to predict academic performance from age, intelligent
quotient, study habit and gender, is:
Academic Performance = 87.83 – (0.165 x age) – (0.385 x intelligent quotient) – (0.118 x study
habit) + (13.208 x gender). This is written using the unstandardized coefficient.
Consider the effect of age in this example. The coefficient for age is equal to -0.165. This means
that for every one unit increase in age, there is a decrease of 0.165 in academic performance.
Also the result indicates that a unit increase in intelligent quotient causes .385 decrease in
academic performance; a unit increase in study habit leads to .118 decrease in academic
performance, while a unit increase in gender causes 13.208 increase in academic performance.
The t and significant columns of the tables of coefficients are used to determine the statistical
significance of the independent variables. If P is less than the set alpha level (say .05), you can
conclude that the coefficient which indicate the contribution of each independent variable in
predicting the dependent variable is statistically significant. From the table, age has a t-valueof-
2.633 and significance of .010, since .010 < .05, then the contribution of age in predicting
performance is statistically significant. Similarly all the other independent variables indicated as
significant predictors of academic performance as their significant level or p values are all less
than .05.
The above result can be summarized as follows: A multiple regression was run to predict
Academic performance from age, intelligent quotient, study habit and gender. These
independent variables statistically significantly predicted academic performance, F (4, 95) =
32.393, p < .05, R2 = .577. All four variables added statistically significantly to the prediction, p <
.05
Revision Questions
38
1. The following data were generated by Dr John in a research of students score in
Entrance examination and their achievement score in mathematics
a(i). write the regression model for predicting Y from X. (interpret the model).
(ii) what is the standard error for such estimation
(iii) what percentage of mathematics achievement can be explained by entrance score.
b. what are the basic assumptions for conducting regression analysis.
c. write a generalized model for multiple linear regression.
d. explain the meaning and implication of standard error of estimate of a regression
equation.
2. Table below shows the scores of 14 students on test anxiety inventory and achievement
in mathematics
X 26 29 20 22 36 30 18 14 22 24 30 32 16 12
(SHI)
Y 67 85 50 60 75 50 48 42 64 78 80 80 65 40
(MA)
a. I) deduce the regression model for predicting mathematics achievement from test
anxiety inventory and
interpret your result.
b. I) Write the regression equation for forecasting test anxiety from mathematics
achievement. (explain your result in each case)
ii) what percentage of mathematics achievement can be attributed to test anxiety?
39
3(a) State 2 uses of regression analysis
(a) State and explain the model for simple linear regression
(b) State and explain the model for multiple regression
4(a) Explain the concept of “Standard Error of Estimate”.
(b)State 4 assumptions of regression analysis
(c) Outline the procedure for calculating regression analysis using SPSS.
Model Summaryb
Model R R Square Adjusted R Std. Error of Durbin-
Square the Estimate Watson
1 .826a .683 .680 10.65919 1.920
a. Predictors: (Constant), selfefficacy, anxiety, interest, attitude
b. Dependent Variable: ACHVT
ANOVAa
Model Sum of Df Mean F Sig.
Squares Square
Regression 96448.032 4 24112.008 212.219 .000b
1 Residual 44765.642 394 113.618
Total 141213.674 398
a. Dependent Variable: ACHVT
b. Predictors: (Constant), SELFEFFICACY, ANXIETY, INTEREST, ATTITUDE
Coefficientsa
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
1 (Constant) -22.107 7.963 -2.776 .006
40
ANXIETY .447 .087 .160 5.115 .000
ATTITUDE 1.297 .079 .691 16.335 .000
INTEREST -.292 .066 -.146 -4.394 .000
SELFEFFICACY -.258 .058 -.168 -4.411 .000
a. Dependent Variable: ACHVT
(a) Interpret the values obtained for (i) R2, (ii) adjusted R2 (iii) standard error of estimate
(b) State the regression model based on results obtained and discuss the appropriateness
of the fit.
(c) Discuss the significance of each independent variable in predicting the dependent
variable.
41