0% found this document useful (0 votes)
145 views8 pages

The Distributed Cohort Model

The distributed cohort model for speech perception suggests that meanings of stimuli are represented by activity patterns across a complete layer of nodes, rather than specific neurons. This model, which processes continuous sequences of monosyllabic English words, demonstrates similarities to human speech perception and production, including the ability to differentiate real words from non-words. Additionally, speech production involves a two-stage process of lexicalization, with evidence from speech errors and priming studies supporting both serial and interactive models of production.

Uploaded by

tabasumriyaz212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views8 pages

The Distributed Cohort Model

The distributed cohort model for speech perception suggests that meanings of stimuli are represented by activity patterns across a complete layer of nodes, rather than specific neurons. This model, which processes continuous sequences of monosyllabic English words, demonstrates similarities to human speech perception and production, including the ability to differentiate real words from non-words. Additionally, speech production involves a two-stage process of lexicalization, with evidence from speech errors and priming studies supporting both serial and interactive models of production.

Uploaded by

tabasumriyaz212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

The distributed cohort model

As we indicated earlier, not all researchers believe that localist models


like the TRACE model are a truthful simulation of how the brain
group of neurons specifically devoted to the phoneme “s” or to the

spoken word “sun.” Rather the meanings of stimuli are represented by

activity patterns of a complete layer of nodes.

Figure 8.7 shows such a model for speech perception, called the

distributed cohort model (Gaskell & Marslen-Wilson, 1997; see also

Magnuson et al., 2020).

The input layer consisted of 11 nodes representing the speech

signal at a particular moment in time (i.e., the activity pattern when a

particular sound of the word was pronounced). All nodes of the input

layer were connected to a hidden layer of 200 units. At each processing

cycle, the hidden layer was copied to a context layer, which fed back to

the hidden layer. In this way, the hidden layer not only received input

from the current processing cycle but also from the previous processing

cycle. This top-down part of the model allowed it to capture systematic

transitions in time. The hidden layer fed activation forward to a layer

of 50 semantic nodes and three layers of output phonology representing

the sounds (phonemes) of the word. The three layers of output

phonology represented the three parts of monosyllabic words: the

consonants before the vowel (called the onset of the word; can be

empty), the vowel (called the nucleus of the word), and the consonants

after the vowel (the coda; can also be empty). By using this coding

scheme all monosyllabic words could be represented.

The input to the model consisted of monosyllabic English words.

Importantly, the words were attached to each other, with no pauses inbetween,

to simulate the fact that words in a stream of spoken languageworks. According to them, there is little
evidence that there would be a are not separated. The model was trained by presenting the continuous

sequence of phonetic feature bundles representing incoming speech to

the input layer. The output of the network was compared to a second sequence of the same length, which
represented the semantics and

phonology of the words. The second sequence was used for comparison

with the network output, allowing connection weights to be updated, so


that the output of the network gradually approached that of the comparison

sequence. During training, the words were presented a few

hundred times in various orders.

After training, the model performed very similarly to human

observers. In particular, the model showed activation of cohorts (both

at the phonological and semantic level) when the onset and the nucleus

of the word were presented, and ended up with a unique representation

after the coda was presented, even though there were no gaps in

the input. The model was also able to decide whether a real word had

been presented (“sun”) or a non-existing monosyllabic non-word

(“suph”). Notice that the model was capable of doing so without

explicit top-down effects from the semantic level. The only top-down

information came from the context layer that kept a record of the

preceding processing step.

10.8 LANGUAGE PRODUCTION

Speech production – deciding what we want to say, and articulating

this accurately and fluently – is a behaviour which we take very much

for granted, and which we typically do extremely well – it has been

estimated that any one talker uses a production vocabulary of around

20,000 words, but that we make mistakes of word selection in only

around every one in a million words produced.

The processes involved in turning thoughts into spoken words are

called lexicalisation, and two main stages have been hypothesised

(Levelt, 1989, 1992). The first stage comprises a link between conceptual

thoughts and word forms which include semantic and syntactic

information, but not phonological detail. This is called the ‘lemma’

and the processes of identifying and choosing the correct word is called

lemma selection. In the second stage, the lemma makes contact with

the phonetic representation of the word, called the lexeme, and the

specifying of this form is called lexeme selection. Much of the evidence

for this two-stage form of word selection in speech production comes


from a frustrating state that many people have experienced, called tipof-

the-tongue state. When in this condition, people have an absolute

certainty that they know a word that they want to say, combined with

a lack of sensation of how they should say it. In this state, people can

often access a lot of information about a word, such as what it means

and aspects of its syntax, and this has been ascribed to being able to

access lemma information, without being able to make contact with

the lexeme detail (Harley, 2001).

There is also experimental evidence for these stages from studies

of priming, for example, participants name pictures more quickly if

they had previously named or defined the word, but not if they had

produced a homophone which sounds the same but has a different

meaning. This suggests that priming at the lemma level (semantic and

syntax) can operate separately from lexeme (phonological) priming.

Historically, another influence on our understanding of speech production

processes comes from studies of speech errors, or ‘slips of the

tongue’. Fromkin (1973) said these errors ‘provide a window into linguistic

processes’ (pp. 43–44), although it has also been pointed out

that these errors rely on accurate acoustic and phonetic decoding by

the listener, which comprise a complex set of psychological processes

(Boucher, 1994).

There are consistencies in the kinds of errors speakers make, where

the errors occur at the level of phonemes, morphemes or words, rather than random noisy patterns of
errors. This gives weight to the suggestion

that they result from specific kinds of errors in the speech production

system (Fromkin, 1971, 1973; Garrett, 1975; Dell, 1986; Harley,

2001).

Garrett developed a model of speech production based on a set of

speech errors which he considered to be particularly informative:

􀁲􀁲 Word substitutions – these affect content words, not (typically)

form words, such as ‘man’ for ‘woman’ or ‘day’ for ‘night’.

􀁲􀁲 Word exchanges in which words from the same category swap positions
with each other, such that nouns swap with nouns, verbs with

verbs, etc.

􀁲􀁲 Sound exchange errors such as classic spoonerisms such as ‘wastey

term’ for ‘tasty worm’, where the onsets of words swap positions

with each other, commonly over words which are next to each other.

􀁲􀁲 Morpheme exchange – this is where word endings (morphological

inflections) move to other points in the sentence, such as ‘Have you

seen Hector Easter’s egg?’ for ‘Have you seen Hector’s Easter egg?’.

Morpheme exchange errors can also include ‘stranding’ errors such

as “Have you seen Easter’s Hector egg?”.

In Garrett’s model of speech production there are several, independent

levels involved in speech production:

1. the message level, which represents the concepts and thoughts that

the speaker wants to express;

2. the functional level, at which these concepts are expressed as semantic

lexical representations, and thematic aspects of the sentence (the

subject and object, for example) are also represented – i.e. the roles

that these semantic items will take in the sentence;

3. the positional level, at which the semantic-lexical representations

are implemented as phonological items, with a syntactic structure;

4. the phonetic level, at which the phonological and syntactic representations

are realised as detailed phonetic sequences, precisely

articulating the word forms and inflections specific by the positional

level;

5. the articulation level, which form control of the vocal apparatus to

express the

In Garrett’s model, the semantic information about content words

is specified at the functional level, while function words and bound

morphemes (such as ‘-ing’ endings) are added to the sentence structure

at the positional level, where they are associated with their

phonetic forms: in contrast, the phonological forms of the content

words needs to be generated within the sentence at the positional


level. This kind of constraint in the model allows for word substitution

errors, which are generated at the functional level (or the lexical

level in Levelt’s model), and which infrequently affect form words.

Likewise, sound exchange errors arise when content words are being phonologically constructed at the
positional level, and again, affect

form words much less (as they are phonologically prespecified at

this level). Stranding errors occur when the content words are being

positioned in the sentence, which occurs before syntactic structure

and inflections are added to the sentence.

This is a serial model of speech production: speech production

is a result of a series of independent output stages in which there

are distinct computational processes specified in a serial, noninteracting

fashion; this is also true of the Levelt model. There are

other approaches to modelling speech production which proceed

along more parallel lines, and which are typically modelled within

the connectionist, interactive framework – for example, the speech

production model of Gary Dell (Dell, 1986; Dell and O’Seaghdha,

1991; Dell et al., 1997). In this model, a spoken sentence is represented

as a sentence frame, and is planned simultaneously across

semantic, syntactic, morphological and phonological levels, with

spreading activation permitting different levels to affect each other.

This allows speech errors to be ‘mixed’: as Dell has pointed out,

many speech errors (such as ‘The wind is strowing strongly’) represent

several different kinds of errors.

Functionally the Dell model works via different points in the sentence

frame activating items in a lexicon – for example when a verb

is specified, there will be activation across interconnected nodes for

concepts, words, morphemes and phonemes. When a node is activated,

there is a spread of activation across all the nodes connected to it.

Thus if the node for the verb ‘run’ is activated, there will also be activation

for the verb ‘walk’. Selection is based on the node with the

highest activation, and after a node has been selected its activation
is reset to zero (to prevent the same word from be continuously produced).

In this way, word substitution errors occur when the wrong

word becomes more highly activated than the correct target word. The

model contains categorical rules which act as constraints on the types

of items which are activated at each level in the model, and these rules

place limits on the kinds of errors that can be made – nouns swapping

places with nouns, for example. In contrast, exchange errors occur

as a result of the increases in activation, which means that a lexical

element (a phoneme, or a word) can appear earlier in a sentence than

was intended, if its activation unexpectedly increased: as the activation

is immediately set to zero once an item has been selected, another

highly activated item is likely to take its place in the intended part of

the sentence frame.

As in other areas of cognitive psychology, there has been a

lively debate about the extent to which speech production is well

modelled by interactive connectionist models, or by more rulebased,

serial, symbolic models. The two-stage model of Levelt was

developed into a more complex six-stage model of spoken word

production (Levelt, 1989; Bock and Levelt, 1994; Levelt et al.,

1999 called WEAVER++ (Word-form Encoding by Activation and

VERification). The stages are: 1. conceptual preparation;

2. lexical selection (the stage at which the abstract lemma is selected);

3. morphological encoding;

4. phonological encoding;

5. phonetic encoding;

6. articulation.

Like Dell’s model, WEAVER++ is a spreading activation model, but

unlike Dell’s model, activation is fed forward in one direction only,

from concepts to articulation: furthermore, the WEAVER++ model

is truly serial, as each stage is completed before the next stage is

started.

In an experimental attempt to generate speech errors, Levelt


et al. (1991) required participants to name pictures while also listening

to words, and pressing a button when they recognised a

word. The relationships between the seen objects and heard words

varied – there were semantic relationships, phonological relationships

and unrelated pairs, and some had a ‘mediated’ relationship

to the picture, that is, linked through a semantic and phonological

connection. If the picture was a dog, a mediated relationship word

could be ‘cot’, which is phonologically similar to ‘cat’, which in turn

has a semantic relationship with ‘dog’. The study was specifically

designed to test the hypothesis, inferred from Dell’s model, that a

model of speech production in which different levels can interact

would predict a facilitation of naming ‘dog’ when ‘cot’ is heard

(Levelt et al., 1991). Experimentally, this predicted facilitation was

not found: there was no phonological activation of semantically

related items. In contrast, the results supported a sequential model.

Specifically, there was priming of lexical decisions to the heard

word from semantically related words only at very short intervals

(around 70 ms), while priming of lexical decisions from phonologically

related words was only significant at longer intervals (around

600 ms). These results were taken to support a sequential, stagebased

implementation of word naming.

There is evidence in favour of the Dell model, however: Morsella

and Miozzo (2002) asked participants to name pictures in the presence

of other (distractor) pictures: there was facilitation of picture

naming when there was a phonological relationship between the target

and distractor pictures. This was taken to show a beneficial effect

of phonological information at an earlier stage in word selection and

production than would be predicted by a feed-forward, sequential

model like Levelt’s.

Speech production has been somewhat less closely studied than

other aspects of language in cognitive psychology (especially when

compared with the detailed investigations of speech production


seen in the aphasia literature as will be seen in Chapter 11); however,

that profile is changing rapidly as a range of experimental

techniques are becoming available to researchers (Griffin and Crew,

2012).

You might also like