Implicit Learning and Statistical Learni
Implicit Learning and Statistical Learni
5 May 2006
The domain-general learning mechanisms elicited in automatically [5–8], incidentally [9], spontaneously [6],
incidental learning situations are of potential interest in or by simple observation [9], and that participants in SL
many research fields, including language acquisition, settings were unaware of the statistical structure of the
object knowledge formation and motor learning. They material [7].
have been the focus of studies on implicit learning for This article first describes how recent evolution in IL
nearly 40 years. Stemming from a different research and SL research fields has made them closer to one
tradition, studies on statistical learning carried out in the another, leading to a growing number of cross-references
past 10 years after the seminal studies by Saffran and and to the occasional use of the two expressions as
collaborators, appear to be closely related, and the synonymous. Conway and Christiansen [10] even now
similarity between the two approaches is strengthened propose the term ‘implicit statistical learning’ to cover the
further by their recent evolution. However, implicit two domains. However, we then go on to show that beyond
learning and statistical learning research favor different the similarity of paradigms and results, the two domains
interpretations, focusing on the formation of chunks and emphasize different interpretations of the data. We
statistical computations, respectively. We examine suggest that this divergence, which has not been high-
these differing approaches and suggest that this lighted as yet, opens up a deep challenge for
divergence opens up a major theoretical challenge for future studies.
future studies.
The recent evolution of IL and SL studies
Ten years ago, it seemed possible to contrast IL and SL on
their main issues of interest, namely syntax acquisition
Introduction and lexicon formation, respectively. Indeed, the to-be-
There is no doubt that many of our most fundamental learned material used in artificial grammar learning
abilities, whether they concern language, perception, research is typically governed by rules, that is by
motor skill, or social behavior, reflect some kind of organizing principles which are independent of the
adaptation to the regularities of the world that evolves specific material used in a given instance. If participants
without intention to learn, and without a clear awareness learned the rules, then this form of learning would be out
of what we know. This ubiquitous phenomenon was called of the scope of SL studies, in which the notion of rules is a
‘implicit learning’ (IL) by Reber [1,2] 40 years ago. Since priori irrelevant. However, research from the past few
then, several studies have explored this form of learning years has made it increasingly clear that participants in
with several experimental paradigms (mainly finite-state artificial grammar learning experiments do not need to
grammars and serial reaction time tasks; for reviews, extract the rules to perform well, even in situations
see [3,4]). involving transfer across surface forms (Box 1). In
Originating from a different research tradition, the addition, the artificial grammar learning paradigms tend
term ‘statistical learning’ (SL) was proposed 10 years ago to be now supplanted by other paradigms, such as the
by Saffran and collaborators [4] to designate the ability of serial reaction-time tasks, in which a description of the
infants to discover the words embedded in a continuous materials in terms of rules appears less appropriate.
artificial language, and this field of research is now Another initial difference between the two domains was
growing exponentially. There are obvious similarities that IL research used a large variety of situations
between SL and IL. As in IL, participants in SL involving different sensory modalities and response
experiments are faced with structured material without systems, whereas SL originally focused on the early
being instructed to learn. They learn merely from stage of language acquisition. However, more recently
exposure to positive instances, without engaging in research on SL has progressively broadened its scope of
analytical processes or hypothesis-testing strategies. investigation. The syllables used in the first studies have
Researchers have pointed out that SL proceeds been replaced by tones with the same results [11,12]. A
Corresponding author: Perruchet, P. ([email protected]). parallel literature has evolved with visual shapes [6–8], or
even tactile stimuli [13]. Perhaps even more importantly,
www.sciencedirect.com 1364-6613/$ - see front matter Q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2006.03.006
234 Review TRENDS in Cognitive Sciences Vol.10 No.5 May 2006
strength amounts to considering other measures of (Box 4). Because the chunking process is usually
association than the raw frequency of co-occurrences. construed as the clustering of adjacent events, these
Box 3 illustrates how implementing forward interference data confront the chunking models with a difficult
is sufficient to make chunk strength sensitive to transi- challenge, as noted by Kuhn and Dienes [47]. In
tional probabilities, which SL researchers consider so principle, they do not raise the same problem for
important. Moreover, Perruchet and Peereman [44] have statistical approaches, because the notion of statistical
shown that PARSER , thanks to the role ascribed to computations does not care about the nature of the data
interference in chunk formation, was even sensitive to (e.g. contiguous or not) on which statistics may be
contingency, that is to a measure of association more computed. However, there is a consensus among
comprehensive than conditional probabilities. researchers working on language and visual perception
The above remarks suggest that it might turn out to be that models relying on statistical computations alone
difficult to decide between concurrent interpretations need to be constrained to avoid combinatorial explosion.
based on a simple consideration of their explanatory The adjacency of the to-be-learned elements provides
power. Because IL and SL have mainly evolved as such a natural constraint (which is implemented, for
separate fields of research, the challenge has not often instance, in SRNs). Thus, the possibility of learning non-
been addressed. A few recent studies, however, have adjacent dependencies entails either an in-depth revi-
begun to explore situations designed in such a way that sion of chunk-based models, or a significant departure
predictions drawn from chunk-based models and statisti- from the most frequent computational implementation of
cal approaches differ. In these studies, some version of an statistical approaches.
SRN is used to quantify the predictions of statistical
approaches, whereas the chunking models are rep- Implications for the issue of consciousness
resented by the Competitive Chunking Model [45] or One of the major implication of the debate outlined above
PARSER [37,44,46]. Although a detailed review of these is the function of consciousness in the learning process. If
studies is beyond the scope of this review, suffice it to say the chunks are inferred from the results of statistical
that, overall, their results do not clearly favor one or the computations, then most of the learning process must be
other account. thought of as unconscious, because statistical compu-
These preliminary results suggest that the present tations are not performed consciously in the context of
accounts will need to be amended. Further models would incidental learning paradigms. Of course, this does not
also allow to encompass data that neither the chunk- mean that chunks, once formed, are functionally inert in
based models nor the statistical approaches in their further steps of conscious activities, but simply that their
current instantiations seem to be able to explain. In the initial emergence is guided by unconscious computations.
past few years, several studies have shown the On the other hand, if the final chunks evolve from the
possibility of incidentally learning the relations between progressive modification of primitive chunks, then the
elements that are not contiguous in space and/or time function of consciousness in chunk formation can be
Box 3. Statistical computations and chunk-based models: how do they converge towards the same predictions?
Above is a 20-letter sequence made up from 8 different letters. Let us here that each occurrence of AB strengthens it by 1 unit, and each
assume that they stand for syllables (although they could equally stand occurrence of another letter pair beginning with A decreases the AB
for tones of different pitches, the consonant letters typically used in strength by 0.5 unit. These parameters were selected arbitrarily, but the
artificial grammar learning studies, the locations of a target on a screen crucial outcome – namely that all the words have a stronger strength
typically involved in serial reaction-time studies, or any other events). than any part-word – remains true whatever the parameters (the
The sequence can be viewed as the random succession of four Pearson r between (b) and (c) is 0.95).
bisyllabic words (they have been colored for ease of reading). How can
the words be discovered? Table I. Analysis of the letter sequence shown above
One solution consists of considering the frequency of all the (a) (b) (c)
bisyllabic units. However, column (a) of Table I shows that, because Units Frequency TP Chunk strength
AB and GH are more frequent than the other words, the ‘part-word’ BG xy x xy/x xyK((xKxy)*0.5)
turns out to be as frequent as the ‘words’ CD and EF. Aslin and
3 3 1 3
collaborators [42,6,7,16] used a similar design to show that participants
do not exploit co-occurrence frequencies, but rather the Transitional 2 2 1 2
Probabilities (TP: Prob. y/xZfrequency of xy/frequency of x). Indeed, as 2 2 1 2
indicated in column (b), considering TPs solves the problem (all word- 3 3 1 3
internal TPs are stronger than TPs straddling word boundaries), hence
BE 1 3 0.33 0
the prevalent claim in the SL literature that participants compute TP.
BG 2 3 0.67 1.5
However, as shown in column (c), the same result can emerge if one
DE 1 2 0.5 0.5
considers instead that participants memorize chunks, as in IL studies. If
memory for chunks was dependent only on their frequency, values in DG 1 2 0.5 0.5
(c) would be identical to values in (a). However, memory consolidation FA 1 2 0.5 0.5
and forgetting also depends on interference. Classical studies on FC 1 2 0.5 0.5
interference show that the memory for AB is impaired by the HA 1 2 0.5 0.5
presentation of AC or AD. For the sake of illustration, we have assumed HC 1 2 0.5 0.5
www.sciencedirect.com
Review TRENDS in Cognitive Sciences Vol.10 No.5 May 2006 237
10 Conway, C.M. and Christiansen, M.H. Statistical learning within and 40 Shanks, D.R. et al. (2002) Modularity and artificial grammar learning.
between modalities: Pitting abstract against stimulus-specific rep- In Implicit Learning and Consciousness (French, R. and
resentations. Psychol. Sci. (in press) Cleeremans, A., eds), pp. 93–120, Psychology Press
11 Saffran, J.R. et al. (1999) Statistical learning of tone sequences by 41 Servan-Schreiber, D. and Anderson, J.R. (1990) Learning artificial
human infants and adults. Cognition 70, 27–52 grammars with competitive chunking. J. Exp. Psychol. Learn. Mem.
12 Saffran, J.R. et al. (2005) Changing the tune: Absolute and relative Cogn. 16, 592–608
pitch processing by adults and infants. Dev. Sci. 8, 1–7 42 Aslin, R.N. et al. (1998) Computation of conditional probability
13 Conway, C.M. and Christiansen, M.H. (2005) Modality-constrained statistics by 8-month-old infants. Psychol. Sci. 9, 321–324
statistical learning of tactile, visual, and auditory sequences. J. Exp. 43 Chang, G.Y. and Knowlton, B.J. (2004) Visual feature learning in
Psychol. Learn. Mem. Cogn. 31, 24–39 artificial grammar classification. J. Exp. Psychol. Learn. Mem. Cogn.
14 Saffran, J.R. (2001) The use of predictive dependencies in language 30, 714–722
learning. J. Mem. Lang. 44, 493–515 44 Perruchet, P. and Peereman, R. (2004) The exploitation of distribu-
15 Saffran, J.R. and Wilson, D.P. (2003) From syllables to syntax: tional information in syllable processing. J. Neuroling. 17, 97–119
Multilevel statistical learning by 12-month-old infants. Infancy 4, 45 Boucher, L. and Dienes, Z. (2003) Two ways of learning associations.
273–284 Cogn. Sci. 27, 807–842
46 Giroux, I. and Rey, A. (2005) Word and sub-word units in speech
16 Hunt, R.H. and Aslin, R.N. (2001) Statistical learning in a serial
perception. Proceedings of the 46th Annual Meeting of the Psycho-
reaction time task: Access to separable statistical cues by individual
nomic Society (Toronto), Abstract No. 3061
learners. J. Exp. Psychol. Gen. 130, 658–680
47 Kuhn, G. and Dienes, Z. Implicit learning of non-local musical rules. J.
17 Kirkham, N.Z. et al. (2002) Visual statistical learning in infancy:
Exp. Psychol. Learn. Mem. Cogn. (in press)
Evidence for a domain general learning mechanism. Cognition 83,
48 Perruchet, P. (2005) Statistical approaches to language acquisition
B35–B42
and the self-organizing consciousness: A reversal of perspective.
18 Shanks, D.R. et al. (2005) Attentional load and implicit sequence
Psychol. Res. 69, 316–329
learning. Psychol. Res. 69, 369–382 49 Perruchet, P. and Vinter, A. (2002) The self-organizing consciousness.
19 Remillard, G. (2003) Pure perceptual-based sequence learning. J. Exp. Behav. Brain Sci. 25, 297–388
Psychol. Learn. Mem. Cogn. 29, 581–597 50 Marcus, G.F. et al. (1999) Rule learning by seven-month-old infants.
20 Jiang, Y. and Chun, M.M. (2001) Selective attention modulates Science 283, 77–80
implicit learning. Q. J. Exp. Psychol. 54A, 1105–1124 51 Dienes, Z. and Altmann, G. (1997) Transfer of implicit knowledge
21 Jiang, Y. and Leung, A-W. (2005) Implicit learning of ignored visual across domains: How implicit and how abstract?. In How Implicit is
context. Psychon. Bull. Rev. 12, 100–106 Implicit Learning? (Berry, D., ed.), pp. 107–123, Oxford University
22 Hoffmann, J. and Sebald, A. (2005) When obvious covariations are not Press
even learned implicitly. Eur. J. Cog. Psychol 17, 449–480 52 Pacton, S. et al. (2001) Implicit learning out of the lab: The case of
23 Shanks, D.R. (2003) Attention and awareness in ‘implicit’ sequence orthographic regularities. J. Exp. Psychol. Gen. 130, 401–426
learning. In Attention and Implicit Learning (Jiménez, L., ed.), pp. 53 Pacton, S. et al. (2005) Children’s implicit learning of graphotactic and
11–42, John Benjamins morphological regularities. Child Dev. 76, 324–339
24 Hsiao, A.T. and Reber, A.S. (1998) The role of attention on implicit 54 Vokey, J.R. and Higham, P.A. (2005) Abstract analogies and positive
sequence learning. In Handbook of Implicit Learning (Stadler, M.A. transfer in artificial grammar learning. Can. J. Exp. Psychol. 59,
and Frensch, P., eds), pp. 471–494, Sage Publications 54–61
25 Toro, J.M. et al. (2005) Speech segmentation by statistical learning 55 Gomez, R.L. (1997) Transfer and complexity in artificial grammar
depends on attention. Cognition 97, B25–B34 learning. Cogn. Psychol. 33, 154–207
26 Baker, C.I. et al. (2004) Role of attention and perceptual grouping in 56 Redington, M. and Chater, N. (2002) Knowledge representation and
visual statistical learning. Psychol. Sci. 15, 460–466 transfer in artificial grammar learning. In Implicit Learning and
27 Pothos, E.M. and Bailey, T.M. (2000) The role of similarity in artificial Consciousness (French, R. and Cleeremans, A., eds), pp. 121–143,
grammar learning. J. Exp. Psychol. Learn. Mem. Cogn. 26, 847–862 Psychology Press
28 Buchner, A. et al. (1998) On the role of fragmentary knowledge in a 57 Gomez, R.L. et al. (2000) The basis of transfer in artificial grammar
sequence learning task. Q. J. Exp. Psychol. 51A, 251–281 learning. Mem. Cogn. 28, 253–263
29 Christiansen, M.H. et al. (1998) Learning to segment speech using 58 Tunney, R.J. and Altmann, G.T.M. (2001) Two modes of transfer in
multiple cues: A connectionist model. Lang. Cogn. Processes 13, artificial grammar learning. J. Exp. Psychol. Learn. Mem. Cogn. 27,
221–268 614–639
30 Perruchet, P. and Vinter, A. (1998) PARSER: A model for word 59 Perruchet, P. et al. (2002) The formation of structurally relevant units
segmentation. J. Mem. Lang. 39, 246–263 in artificial grammar learning. Q. J. Exp. Psychol. 55A, 485–503
60 Gomez, R. (2002) Variability and detection of invariant structure.
31 Cleeremans, A. and McClelland, J.L. (1991) Learning the structure of
Psychol. Sci. 13, 431–436
event sequences. J. Exp. Psychol. Gen. 120, 235–253
61 Newport, E.L. and Aslin, R.N. (2004) Learning at a distance: I.
32 Cleeremans, A. (1993) Mechanims of Implicit Learning: A Connec-
Statistical learning of non-adjacent dependencies. Cogn. Psychol. 48,
tionnist Model of Sequence Processing, MIT Press
127–162
33 Kinder, A. and Shanks, D.R. (2003) Neuropsychological dissociations
62 Onnis, L. et al. (2005) Phonology impacts segmentation in online
between priming and recognition: a single-system connectionist
speech processing. J. Mem. Lang. 53, 225–237
account. Psychol. Rev. 110, 728–744
63 Perruchet, P. et al. (2004) Learning non-adjacent dependencies: No
34 Tillmann, B. et al. (2000) Implicit learning of tonality: A self-
need for algebraic-like computations. J. Exp. Psychol. Gen. 133,
organizing approach. Psychol. Rev. 107, 885–913 573–583
35 Saffran, J.R. (2001) Words in a sea of sounds: The output of statistical 64 Creel, S.C. et al. (2004) Distant melodies: Statistical learning of
learning. Cognition 81, 149–169 nonadjacent dependencies in tone sequences. J. Exp. Psychol. Learn.
36 Meulemans, T. and Van der Linden, M. (2003) Implicit learning of Mem. Cogn. 30, 1119–1130
complex information in amnesia. Brain Cogn. 52, 250–257 65 Dienes, Z. and Longuet-Higgins, C. (2004) Can musical transform-
37 Jimenez, L. (2005) Chunk structure in implicit and explicit sequence ations be implicitly learned? Cogn. Sci. 28, 531–558
learning, 2nd European Workshop on Movement Science (Vienna), 66 Gomez, R.L. and Maye, J. (2005) The developmental trajectory of
Abstract No. 2.1.6 nonadjacent dependency learning. Infancy 7, 183–206
38 Anderson, J.R. and Lebiere, C. (1998) The Atomic Components of 67 Poletiek, F.H. (2002) Implicit learning of a recursive rule in an
Thought, Erlbaum artificial grammar. Acta Psychol. (Amst.) 111, 323–335
39 Shanks, D.R. et al. Disruption of sequential priming in organic and 68 Perruchet, P. and Rey, A. (2005) Does the mastery of center-embedded
pharmacological amnesia: A role for the medial temporal lobes in linguistic structures distinguish humans from nonhuman primates?
implicit contextual learning. Neuropsychopharmacology (in press) Psychonomic Bull. Rev. 12, 307–313
www.sciencedirect.com