What will this chapter tell me?
There are pivotal moments in everyone’s life, and one of mine was at the age of 11. Where I grew up in England there
were three choices when leaving primary school and moving on to secondary school: )1( state school (where most
people go); )2( grammar school (where clever people who pass an exam called the Eleven Plus go); and )3( private
school (where rich people go). My parents were not rich and I am not clever and consequently I failed my Eleven Plus, so
private school and grammar school (where my clever older brother had gone) were out. This left me to join all of my
friends at the local state school. I could not have been happier. Imagine everyone’s shock when my parents received a
letter saying that some extra spaces had become available at the grammar school; although the local authority could
scarcely believe it and had checked the Eleven Plus papers several million times to confirm their findings, I was next on
their list. I could not have been unhappier. So, I waved goodbye to all of my friends and trundled off to join my brother at
Ilford County High School for Boys (a school that still hit students with a cane if they were particularly bad and that, for
some considerable time and with good reason, had ‘H.M. Prison’ painted in huge white letters on its roof). It was
goodbye to normality, and hello to six years of learning how not to function in society. I often wonder how my life would
have turned out had I not gone to this school; in the parallel universes where the letter didn’t arrive and the parallel Andy
went to state school, or where his parents were rich and he went to private school, what became of him? If we wanted to
1
compare these three situations we couldn’t use a t-test because there are more than two conditions. However, this
chapter tells us all about the statistical models that we use to analyse situations in which we want to compare more than
two conditions: analysis of variance (or ANOVA to its friends). This chapter will begin by explaining the theory of ANOVA
when different participants are used (independent ANOVA). We’ll then look at how to carry out the analysis in SPSS and
interpret the results.
1
Really, this is the least of our problems: there’s the small issue of needing access to parallel universes.
The theory behind ANOVA
Using a linear model to compare means
We saw in Chapter 9 that if we include a predictor variable containing two categories into the linear model then the
resulting b for that predictor compares the difference between the mean score for the two categories. We also saw in
Chapter 10 that if we want to include a categorical predictor that contains more than two categories, this can be
achieved by recoding that variable into several categorical predictors each of which has only two categories (dummy
coding). We can flip this idea on its head to ask how we can use a linear model to compare differences between the
means of more than two groups. The answer is the same: we use dummy coding to represent the groups and stick
them in a linear model. Many people are taught that to compare differences between several means we use ‘ANOVA’
and to look at relationships between variables we use ‘regression’ (Jane Superbrain Box 11.1). ANOVA and
regression are often taught as though they are completely unrelated tests. However, as we have already seen in
Chapter 8, we test the fit of a regression model with an ANOVA (the F-test). In fact, ANOVA is just a special case of
the linear model (i.e., regression) we have used throughout the book.
There are several good reasons why I think ANOVA is best understood as a linear model. First, it provides a
familiar context: I wasted many trees trying to explain regression, so why not use this base of knowledge to explain a
new concept (it should make it easier to understand)? Second, the traditional method of teaching ANOVA (known as
the variance ratio method) is fine for simple designs, but becomes impossibly cumbersome in more complex situations
(such as analysis of covariance). The regression model extends very logically to these more complex designs without
anyone needing to get bogged down in mathematics. Finally, the variance ratio method becomes extremely
2
unmanageable in unusual circumstances such as when you have unequal sample sizes. The regression method
makes these situations considerably simpler. Although these reasons are good enough, SPSS very much deals with
ANOVA in a regression-y sort of way (known as the general linear model, or GLM).
I have mentioned that ANOVA is a way of comparing the ratio of systematic variance to unsystematic variance in
an experimental study. The ratio of these variances is known as the F-ratio. However, any of you who have read
Chapter 8 should recognize the F-ratio (see Section 8.2.4) as a way to assess how well a regression model can
predict an outcome compared to the error within that model. If you haven’t read Chapter 8 (surely not!), have a look
before you carry on (it should only take you a couple of weeks to read). How can the F-ratio be used to test
differences between means and whether a regression model fits the data? The answer is that when we test
differences between means we are fitting a regression model and using F to see how well it fits the data, but the
regression model contains only categorical predictors (i.e., grouping variables). So, just as the t-test could be
represented by the linear regression equation (see Section 9.2.2), ANOVA can be represented by the multiple
regression equation in which the number of predictors is one less than the number of categories of the independent
variable.
Let’s take an example. There was a lot of excitement, when I wrote the first edition of this book, surrounding the
drug Viagra. Admittedly there’s less excitement now, but it has been replaced by an alarming number of spam emails
on the subject (for which I’ll no doubt be grateful in 15 years’ time), so I’m going to stick with the example. Viagra is a
sexual stimulant (used to treat impotence) that broke into the black market under the belief that it will make someone a
better lover (oddly enough, there was a glut of journalists taking the stuff at the time in the name of ‘investigative
journalism’… hmmm!). In the psychology literature sexual performance issues have been linked to a loss of libido
(Hawton, 1989). Suppose we tested this belief by taking three groups of participants and administering one group with
a placebo (such as a sugar pill), one group with a low dose of Viagra and one with a high dose. The dependent
variable was an objective measure of libido (I will tell you only that it was measured over the course of a week – the
rest I will leave to your own imagination). The data are in Table 11.1 and can be found in the file Viagra.sav (which is
described in detail later in this chapter).
2
Having said this, it is well worth the effort in trying to obtain equal sample sizes in your different conditions because unbalanced designs do cause
statistical complications (see Section 11.3).