0% found this document useful (0 votes)
29 views6 pages

Implicit Learning and Statistical Learni

The document provides useful insights into new methods of reflective experiential practice of learning.

Uploaded by

Divej Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views6 pages

Implicit Learning and Statistical Learni

The document provides useful insights into new methods of reflective experiential practice of learning.

Uploaded by

Divej Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Review TRENDS in Cognitive Sciences Vol.10 No.

5 May 2006

Implicit learning and statistical


learning: one phenomenon,
two approaches
Pierre Perruchet and Sebastien Pacton
Université de Bourgogne, LEAD/CNRS, Pôle AAFE, Esplanade Erasme, 21000 Dijon, France

The domain-general learning mechanisms elicited in automatically [5–8], incidentally [9], spontaneously [6],
incidental learning situations are of potential interest in or by simple observation [9], and that participants in SL
many research fields, including language acquisition, settings were unaware of the statistical structure of the
object knowledge formation and motor learning. They material [7].
have been the focus of studies on implicit learning for This article first describes how recent evolution in IL
nearly 40 years. Stemming from a different research and SL research fields has made them closer to one
tradition, studies on statistical learning carried out in the another, leading to a growing number of cross-references
past 10 years after the seminal studies by Saffran and and to the occasional use of the two expressions as
collaborators, appear to be closely related, and the synonymous. Conway and Christiansen [10] even now
similarity between the two approaches is strengthened propose the term ‘implicit statistical learning’ to cover the
further by their recent evolution. However, implicit two domains. However, we then go on to show that beyond
learning and statistical learning research favor different the similarity of paradigms and results, the two domains
interpretations, focusing on the formation of chunks and emphasize different interpretations of the data. We
statistical computations, respectively. We examine suggest that this divergence, which has not been high-
these differing approaches and suggest that this lighted as yet, opens up a deep challenge for
divergence opens up a major theoretical challenge for future studies.
future studies.
The recent evolution of IL and SL studies
Ten years ago, it seemed possible to contrast IL and SL on
their main issues of interest, namely syntax acquisition
Introduction and lexicon formation, respectively. Indeed, the to-be-
There is no doubt that many of our most fundamental learned material used in artificial grammar learning
abilities, whether they concern language, perception, research is typically governed by rules, that is by
motor skill, or social behavior, reflect some kind of organizing principles which are independent of the
adaptation to the regularities of the world that evolves specific material used in a given instance. If participants
without intention to learn, and without a clear awareness learned the rules, then this form of learning would be out
of what we know. This ubiquitous phenomenon was called of the scope of SL studies, in which the notion of rules is a
‘implicit learning’ (IL) by Reber [1,2] 40 years ago. Since priori irrelevant. However, research from the past few
then, several studies have explored this form of learning years has made it increasingly clear that participants in
with several experimental paradigms (mainly finite-state artificial grammar learning experiments do not need to
grammars and serial reaction time tasks; for reviews, extract the rules to perform well, even in situations
see [3,4]). involving transfer across surface forms (Box 1). In
Originating from a different research tradition, the addition, the artificial grammar learning paradigms tend
term ‘statistical learning’ (SL) was proposed 10 years ago to be now supplanted by other paradigms, such as the
by Saffran and collaborators [4] to designate the ability of serial reaction-time tasks, in which a description of the
infants to discover the words embedded in a continuous materials in terms of rules appears less appropriate.
artificial language, and this field of research is now Another initial difference between the two domains was
growing exponentially. There are obvious similarities that IL research used a large variety of situations
between SL and IL. As in IL, participants in SL involving different sensory modalities and response
experiments are faced with structured material without systems, whereas SL originally focused on the early
being instructed to learn. They learn merely from stage of language acquisition. However, more recently
exposure to positive instances, without engaging in research on SL has progressively broadened its scope of
analytical processes or hypothesis-testing strategies. investigation. The syllables used in the first studies have
Researchers have pointed out that SL proceeds been replaced by tones with the same results [11,12]. A
Corresponding author: Perruchet, P. ([email protected]). parallel literature has evolved with visual shapes [6–8], or
even tactile stimuli [13]. Perhaps even more importantly,
www.sciencedirect.com 1364-6613/$ - see front matter Q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2006.03.006
234 Review TRENDS in Cognitive Sciences Vol.10 No.5 May 2006

A new question: chunk formation versus statistical


Box 1. Fading out of the rule vs. no-rule
computation
Transfer tasks, in which the form of the material presented during Although the similarities between IL and SL are
the training phase is changed, have been used in studies involving
impressive, comparing the interpretations favored in
the learning of artificial materials by infants (e.g. [50]) and adults
(e.g. [51]), and the learning of world-sized regularities by children both fields leads us to a thought-provoking observation.
and adults [52,53]. For instance, in the study by Marcus [50], infants In the IL literature, several models have been developed
previously exposed to exemplars of an ABB grammar (e.g. ga-ti-ti) as alternatives to the initial rule-based view. The first
subsequently listen more to the sentences generated by an ABA alternative idea in artificial grammar learning research
grammar (e.g. wo-fe-wo) than to sentences generated by the ABB
was that participants memorized the displayed strings of
grammar (e.g. wo-fe-fe), although new syllables were involved in
both types of sentences. letters, then performed their grammaticality judgments
Although the empirical evidence for the phenomenon is undis- on the basis of the similarity between the test items and
puted, the prevailing idea that positive transfer supports a rule- the study items. The role of similarity in grammaticality
based interpretation has now been challenged. Some studies judgments has been shown in some studies (e.g. [27]).
suggest that stimuli are not processed at an abstract level during
incidental training [10]. The above chance performance of partici-
However, there is also significant evidence that partici-
pants could be due to analogical processes triggered by the transfer pants memorize fragments of strings, and that grammati-
items during the test [54], and could be mediated by the participants’ cality judgments rely, at least partly, on this form of
explicit knowledge of the structure [55]. knowledge. It has been argued that the fragments or
It is also possible to acknowledge some form of abstract coding chunks provide a most efficient coding of the information,
without surmising rule-knowledge. Indeed, as argued by Redington
and Chater [56], ‘surface independence and rule-based knowledge
because learning makes their selection increasingly
are orthogonal concepts’. It is worth stressing that the evidence of adapted to the structure of the material (Box 2). This
transfer in implicit training conditions, even in adults, is limited to kind of interpretation has been applied as well to other IL
simple and salient features of the stimuli, and especially to the paradigms such as serial reaction-time tasks [28]
structure of repetitions (e.g. [57,58]). In these conditions, transfer
By contrast, the interpretation proposed in the SL
might be based on the direct coding of abstract relations at the
perceptual level [12,47]. approach postulates that participants perform statistical
computations. Evidence for segmentation is generally
attributed to the ability of participants to compute some
recent SL studies are no longer limited to the segmenta- kind of conditional probabilities between successive or
tion of a continuous display into word-like units, but they contiguous elements. This interpretation prevails for
also explore other, more complex structures [14]. For auditory artificial languages (e.g. [14,15]) as well as for
instance, Saffran and Wilson [15] have used a finite-state visual scenes (e.g. [7,9]). At the computational level, this
grammar to generate their artificial language, and Hunt interpretation is generally implemented by connectionist
and Aslin [16] used a serial reaction time task, hence networks, most often SRNs (e.g. [29]).
borrowing the prototypical situations of IL to investigate Note that the contrast we draw here is not as clear-cut as
the properties of SL. our presentation suggests. There have been a few attempts
A recent set of results on the role of attention further to account for word segmentation with chunking models
strengthens the similarity between IL and SL. Although a (e.g. [30]), although they have been virtually ignored in SL
few earlier IL studies claimed that at least some forms of literature. More significantly, the performance in IL
learning do not require attention, the bulk of recent paradigms has been often simulated with SRNs (e.g. [31–
evidence supports the opposite conclusion. For instance, 33]). Because SRNs, like any connectionist network, are
Shanks and collaborators (e.g. [18]) showed that perform- sensitive to statistical regularities, this means that certain
IL researchers have construed implicit learning as
ances in serial reaction time tasks are degraded under
statistical computations [3,32,34]. However, the coexis-
double-task conditions (see also [19]). Likewise, Chun and
tence of chunk-based theories and connectionist models
Jiang (e.g. [20]) showed that implicit learning in the
within the IL literature has not drawn much attention,
contextual cuing paradigm is robust only when relevant,
largely because their common opposition to rule-based
predictive information is selectively attended to (see also
models overshadowed their differences. The joint consider-
[21]). In covariation learning, Hoffman and Sebald [22]
ation of IL and SL studies now brings the contrast between
showed that no learning occurs without attention, even the two accounts on the front of the scene.
when the to-be-learned covariations are highly salient (for
reviews on earlier studies, see [23,24]. The same con-
clusion emerges from studies in SL. When the perform- Combining chunks and statistics: three possible
ance of participants in a dual task setting is compared scenarios
with that of participants attending to the to-be-learned Nobody denies the existence of chunk knowledge. The
materials, the former is always degraded compared to advocates of statistical approaches claim themselves that
the latter. This has been observed in standard word learning shapes some kind of psychological units. For
segmentation tasks [25] as well as in paradigms using instance, Saffran and collaborators [15,35] have shown
visual shapes [8,26]. that training with unsegmented speech results in the
Thus arguably, IL and SL studies now pursue essentially formation of word-like units, rather than in strings of
the same objective - namely, the study of domain-general sounds the probability of which varies on a continuous
learning mechanisms acting on attended information in dimension. Likewise, Baker and collaborators [26] and
incidental, unsupervised learning situations (e.g. [17]). Fiser and Aslin [9] emphasize that the end result of SL
www.sciencedirect.com
Review TRENDS in Cognitive Sciences Vol.10 No.5 May 2006 235

observed in normal participants between conscious knowl-


Box 2. Artificial grammar meets word segmentation
edge and performance.
Figure I shows a finite-state grammar that has been widely used in IL The second possibility is that statistical computations
studies. After exposure to strings of letters generated by this
and chunk formation are two successive steps in the
grammar (e.g. a or b below), participants are able to discriminate
new grammatical and ungrammatical strings of letters. learning process. Chunks would be inferred from prior
It is now generally accepted that artificial grammar learning relies, statistical computations. Typically, chunk boundaries are
at least partly, on the formation of small chunks. If those chunks were defined as the points where the predictibility of successive
selected randomly, as illustrated in line (a) below, memorizing or spatially contiguous elements is the lowest. This
chunks would be difficult and inefficient: there would be many
interpretation is largely prevalent in SL research, both
chunks, infrequently repeated across strings, and their recombina-
tion would have a low probability of generating a new grammatical for oral stimuli (e.g. [35]) and for visual scenes (e.g. [9]).
string. Instead, learning consists in forming chunks that recur as The third possibility is that the formation of chunks is
often as possible in different strings, and whose recombination has the only effective process, with the sensitivity to statistical
high chance of generating a new grammatical string [59]. Line (b) structure being a by-product of this process. At least two
below shows how the strings of letters displayed in line (a) can be
segmented into more relevant units. For instance, the chunks now
computationally implemented models illustrate this
implicitly encode the recursive loop RFV, shown in red in the option, the Competitive Chunking model [41] and PARSER
grammar diagram. [30]. In PARSER, for instance, the chunks are formed from
Note that the issue of artificial grammar learning framed in this the outset on a random basis, as a natural consequence of
way appears very close to that investigated in word segmentation the capacity-limited attentional processing of the incom-
studies. In both cases, learning consists of finding the most relevant
units to encode information.
ing information. These chunks are then forgotten or
strengthened according to the laws governing
associative memory.
We will now focus on the last two possibilities, the first
in which chunking is based on prior statistical analyses,
D and the second in which chunking is a primitive process
the result of which amounts to simulating
statistical computations.
V

S Does efficient chunking need prior statistical


X computations?
R We are not aware of empirical arguments from proponents
F
of chunk-based theories against the models assuming
statistical computations, except that this assumption
H M could be unnecessary. By contrast, SL researchers have
occasionally argued that chunk models are only sensitive
to the raw frequency of co-occurrences [14], whereas
T
studies in SL have shown that participants were sensitive
to more subtle statistics, such as conditional (or transi-
P tional) probabilities (e.g. [14,42]). Indeed, most chunk-
TRENDS in Cognitive Sciences based learning models, such as the competitive chunking
Figure I. A typical finite-state grammar, with a recursive loop of letters, RFV,
model [41] or the measures of chunk strength used in the
highlighted. influential studies by Knowlton and collaborators (e.g.
[43]) exclusively exploit frequency information, certainly
because this initially appeared to be sufficient to account
for a large part of the available evidence.
with visual displays is the formation of objects. This leaves However, the exclusive focus on frequency of many
three possibilities to account for the available evidence: chunk-based models is somewhat surprising in itself.
The first possibility is that statistical computations and Although chunk-based models are claimed to implement
chunk formation are independent processes. Meulemans associative learning principles, assuming that chunk
and Van Der Linden [36] have argued for this view, with memory only depends on their frequency amounts to
the additional assumption that chunk formation is neglecting some of the most basic laws governing the
responsible for conscious knowledge, and statistical formation of associations. Indeed, it has long been known
computation for improved performance in implicit tasks that the strength of memory traces does not only depend
(for related hypotheses, see [37,38]). This hypothesis is on the number of repetitions of the study pairs. In
grounded on the dissociation between performance particular, forgetting is due in large part to the inter-
and explicit knowledge observed in amnesic patients. ference generated by the prior or subsequent events that
However, alternative interpretations have been proposed are related in some way to the target event. The sequential
for this dissociation, notably by Shanks and collaborators material used in both IL and SL studies is certainly prone
[33,39,40]. Shanks and collaborators assume a single to generate strong and pervasive interference, because it
knowledge basis, and hence, in addition to the advantage is typically generated by recombining a small number of
of better parsimony, their interpretation provides primitives. Now, and this is the crucial point, taking into
a natural account for the ubiquitous relationships account the effect of interference in evaluating chunk
www.sciencedirect.com
236 Review TRENDS in Cognitive Sciences Vol.10 No.5 May 2006

strength amounts to considering other measures of (Box 4). Because the chunking process is usually
association than the raw frequency of co-occurrences. construed as the clustering of adjacent events, these
Box 3 illustrates how implementing forward interference data confront the chunking models with a difficult
is sufficient to make chunk strength sensitive to transi- challenge, as noted by Kuhn and Dienes [47]. In
tional probabilities, which SL researchers consider so principle, they do not raise the same problem for
important. Moreover, Perruchet and Peereman [44] have statistical approaches, because the notion of statistical
shown that PARSER , thanks to the role ascribed to computations does not care about the nature of the data
interference in chunk formation, was even sensitive to (e.g. contiguous or not) on which statistics may be
contingency, that is to a measure of association more computed. However, there is a consensus among
comprehensive than conditional probabilities. researchers working on language and visual perception
The above remarks suggest that it might turn out to be that models relying on statistical computations alone
difficult to decide between concurrent interpretations need to be constrained to avoid combinatorial explosion.
based on a simple consideration of their explanatory The adjacency of the to-be-learned elements provides
power. Because IL and SL have mainly evolved as such a natural constraint (which is implemented, for
separate fields of research, the challenge has not often instance, in SRNs). Thus, the possibility of learning non-
been addressed. A few recent studies, however, have adjacent dependencies entails either an in-depth revi-
begun to explore situations designed in such a way that sion of chunk-based models, or a significant departure
predictions drawn from chunk-based models and statisti- from the most frequent computational implementation of
cal approaches differ. In these studies, some version of an statistical approaches.
SRN is used to quantify the predictions of statistical
approaches, whereas the chunking models are rep- Implications for the issue of consciousness
resented by the Competitive Chunking Model [45] or One of the major implication of the debate outlined above
PARSER [37,44,46]. Although a detailed review of these is the function of consciousness in the learning process. If
studies is beyond the scope of this review, suffice it to say the chunks are inferred from the results of statistical
that, overall, their results do not clearly favor one or the computations, then most of the learning process must be
other account. thought of as unconscious, because statistical compu-
These preliminary results suggest that the present tations are not performed consciously in the context of
accounts will need to be amended. Further models would incidental learning paradigms. Of course, this does not
also allow to encompass data that neither the chunk- mean that chunks, once formed, are functionally inert in
based models nor the statistical approaches in their further steps of conscious activities, but simply that their
current instantiations seem to be able to explain. In the initial emergence is guided by unconscious computations.
past few years, several studies have shown the On the other hand, if the final chunks evolve from the
possibility of incidentally learning the relations between progressive modification of primitive chunks, then the
elements that are not contiguous in space and/or time function of consciousness in chunk formation can be

Box 3. Statistical computations and chunk-based models: how do they converge towards the same predictions?

Above is a 20-letter sequence made up from 8 different letters. Let us here that each occurrence of AB strengthens it by 1 unit, and each
assume that they stand for syllables (although they could equally stand occurrence of another letter pair beginning with A decreases the AB
for tones of different pitches, the consonant letters typically used in strength by 0.5 unit. These parameters were selected arbitrarily, but the
artificial grammar learning studies, the locations of a target on a screen crucial outcome – namely that all the words have a stronger strength
typically involved in serial reaction-time studies, or any other events). than any part-word – remains true whatever the parameters (the
The sequence can be viewed as the random succession of four Pearson r between (b) and (c) is 0.95).
bisyllabic words (they have been colored for ease of reading). How can
the words be discovered? Table I. Analysis of the letter sequence shown above
One solution consists of considering the frequency of all the (a) (b) (c)
bisyllabic units. However, column (a) of Table I shows that, because Units Frequency TP Chunk strength
AB and GH are more frequent than the other words, the ‘part-word’ BG xy x xy/x xyK((xKxy)*0.5)
turns out to be as frequent as the ‘words’ CD and EF. Aslin and
3 3 1 3
collaborators [42,6,7,16] used a similar design to show that participants
do not exploit co-occurrence frequencies, but rather the Transitional 2 2 1 2
Probabilities (TP: Prob. y/xZfrequency of xy/frequency of x). Indeed, as 2 2 1 2
indicated in column (b), considering TPs solves the problem (all word- 3 3 1 3
internal TPs are stronger than TPs straddling word boundaries), hence
BE 1 3 0.33 0
the prevalent claim in the SL literature that participants compute TP.
BG 2 3 0.67 1.5
However, as shown in column (c), the same result can emerge if one
DE 1 2 0.5 0.5
considers instead that participants memorize chunks, as in IL studies. If
memory for chunks was dependent only on their frequency, values in DG 1 2 0.5 0.5
(c) would be identical to values in (a). However, memory consolidation FA 1 2 0.5 0.5
and forgetting also depends on interference. Classical studies on FC 1 2 0.5 0.5
interference show that the memory for AB is impaired by the HA 1 2 0.5 0.5
presentation of AC or AD. For the sake of illustration, we have assumed HC 1 2 0.5 0.5

www.sciencedirect.com
Review TRENDS in Cognitive Sciences Vol.10 No.5 May 2006 237

Box 4. Learning non-adjacent dependencies Box 5. Questions for future research


Both IL and SL approaches have focused on the human ability to † Bringing together IL and SL domains of research suggests many
detect and exploit the relations between elements in close temporal directions for further studies, because each domain has explored a
or spatial proximity. However, linguistic structures, as well as other limited set of issues, and it appears necessary to confirm that the
domains of high-level knowledge such as music, also include remote conclusions reached in one domain may be generalized to the
dependencies. That is to say, a relation exists between two events, A experimental paradigms typical of the other one. The following two
and C, irrespective of the intervening events (X), as illustrated in (a) questions instantiate this research strategy.
below. In the past few years, this has given rise to a set of studies † A large part of the recent IL literature is concerned with the relative
investigating the possibility of learning non-adjacent dependencies invariance of implicit learning mechanisms with age, in children and
in artificial languages [60–63] and in music [64,65] in elderly people. Furthermore, several studies have investigated
incidental conditions. whether these mechanisms are preserved in people with mental
The results show a consensus that learning non-adjacent retardation, and in patients with psychiatric and neurological
dependencies is possible, but under far more restrictive conditions disorders. Are results the same with the paradigms generally used
than those required for learning the relations between contiguous in SL studies? Likewise, there has been some brain imaging studies
events. Gomez [60,66] showed that the degree to which the AXC of IL, and investigating whether the same neural regions would
relationships are learned depends on the variability of the middle mediate SL would be of interest.
element (X). For Newport and Aslin (e.g. [61]) and Onnis et al. [62], † Although both IL and SL studies involve arbitrary materials, it turns
the crucial factor is the similarity between A and C. Another out that these materials can never be considered to be completely
facilitating condition is that the AXC units are displayed as neutral with regards to learning. For instance, participants may be
individualized, pre-segmented units, rather than embedded within guided towards (or diverted from) the discovery of its underlying
a continuous stream of stimuli. structure by prior knowledge of related real situations, the acoustic
Closely related are studies in which the to-be learned stimuli are or visual features of the materials, and so on. In SL research, the
generated by a bi-conditional grammar [40,47], as in (b), or in which interactions between the statistical structure and the other features
the dependencies are self-embedded [67,68,4], as in (c). In these of the situations that might affect learning have been taken as a
more complex cases, the possibility of obtaining evidence of valuable problem of its own, especially in the context of lexicon
learning in truly incidental conditions does not seem clearly formation in young children. Similar problems should be explored
established yet. with the paradigms used in IL research.
† Are conditional probabilities the main statistic to which human
behavior is sensitive, as the literature on SL assumes? Research on
Non adjacent dependency: The relation is between A and C,
conditioning has suggested that conditioned performances depend
whatever X
on more elaborate measures of association, such as Delta P or
contingency. IL/SL research on this question has hardly scratched
Biconditional grammar: The relations are between A and X, B and the surface in this area.
Y, etc. † Irrespective of the importance given to chunks in the dynamics of
learning, it seems essential to investigate this notion further. For
instance, can chunks be thought of as the content of the attentional
Self embedded dependencies: The relations are between A and X,
focus, or is it possible to assume the existence of functional chunks
B and Y, etc.
that participants would not be aware of?

construed differently. Starting from the postulate that Acknowledgements


chunks are the actual content of phenomenal experience, This work was supported by grants from the Centre National de la
Perruchet and Vinter [48,49] outlined a view of the human Recherche Scientifique (CNRS, UMR 5022 and FRE 2987), from the
mind in which consciousness is thought of as self- Université de Bourgogne, from the Région de Bourgogne (AAFE), and
organized. In this model, the optimal coding of the from the Université Paris V. The authors thank Stephanie Chambaron,
Suzanne Filipic, Bob French, Barbara Tillmann, and the anonymous
incoming information occurs as a natural by-product of
reviewers of a first draft for their help at various stages of elaboration.
the evolution of conscious percepts and representations,
under the action of simple associative learning and
memory processes.
References
1 Reber, A.S. (1967) Implicit learning of artificial grammars. J. Verbal
Conclusion Learn. Verbal Behav. 6, 855–863
Recent evolution of research on both IL, initially aimed at 2 Reber, A.S. (1993) Implicit Learning and Tacit Knowledge: An Essay
on the Cognitive Unconscious, Oxford University Press
studying rule abstraction in complex situations, and SL, 3 Cleeremans, A. et al. (1998) Implicit learning: News from the front.
initially focused on word segmentation, suggests that the Trends Cogn. Sci. 2, 406–416
two lines of research explore the same domain-general 4 Shanks, D.R. (2005) Implicit learning. In Handbook of Cognition
incidental learning processes. Bringing together these two (Lamberts, K. and Goldstone, R., eds), pp. 202–220, Sage Publications
domains of research, however, reveals a divergence 5 Saffran, J.R. et al. (1996) Statistical learning by 8-month-old infants.
Science 274, 1926–1928
between the interpretation favored in IL, which focuses
6 Fiser, J. and Aslin, R.N. (2001) Unsupervised statistical learning of
on the formation of chunks, and the interpretation favored higher-order spatial structures from visual scenes. Psychol. Sci. 12,
in SL, which relies on statistical computations. One 499–504
possibility is that chunks are inferred from the results of 7 Fiser, J. and Aslin, R.N. (2002) Statistical learning of higher-order
(unconscious) statistical computations. Another possi- temporal structure from visual shape sequences. J. Exp. Psychol.
bility is that (perhaps conscious) chunks are formed from Learn. Mem. Cogn. 28, 458–467
8 Turk-Browne, N.B. et al. (2005) The automaticity of visual statistical
the outset and then evolve as a result of basic associative
learning. J. Exp. Psychol. Gen. 134, 552–564
learning principles. It is clear that there are many 9 Fiser, J. and Aslin, R.N. (2005) Encoding multielement scenes:
challenges for future research in these two areas (see Statistical learning of visual features hierarchies. J. Exp. Psychol.
Box 5). Gen. 134, 521–537
www.sciencedirect.com
238 Review TRENDS in Cognitive Sciences Vol.10 No.5 May 2006

10 Conway, C.M. and Christiansen, M.H. Statistical learning within and 40 Shanks, D.R. et al. (2002) Modularity and artificial grammar learning.
between modalities: Pitting abstract against stimulus-specific rep- In Implicit Learning and Consciousness (French, R. and
resentations. Psychol. Sci. (in press) Cleeremans, A., eds), pp. 93–120, Psychology Press
11 Saffran, J.R. et al. (1999) Statistical learning of tone sequences by 41 Servan-Schreiber, D. and Anderson, J.R. (1990) Learning artificial
human infants and adults. Cognition 70, 27–52 grammars with competitive chunking. J. Exp. Psychol. Learn. Mem.
12 Saffran, J.R. et al. (2005) Changing the tune: Absolute and relative Cogn. 16, 592–608
pitch processing by adults and infants. Dev. Sci. 8, 1–7 42 Aslin, R.N. et al. (1998) Computation of conditional probability
13 Conway, C.M. and Christiansen, M.H. (2005) Modality-constrained statistics by 8-month-old infants. Psychol. Sci. 9, 321–324
statistical learning of tactile, visual, and auditory sequences. J. Exp. 43 Chang, G.Y. and Knowlton, B.J. (2004) Visual feature learning in
Psychol. Learn. Mem. Cogn. 31, 24–39 artificial grammar classification. J. Exp. Psychol. Learn. Mem. Cogn.
14 Saffran, J.R. (2001) The use of predictive dependencies in language 30, 714–722
learning. J. Mem. Lang. 44, 493–515 44 Perruchet, P. and Peereman, R. (2004) The exploitation of distribu-
15 Saffran, J.R. and Wilson, D.P. (2003) From syllables to syntax: tional information in syllable processing. J. Neuroling. 17, 97–119
Multilevel statistical learning by 12-month-old infants. Infancy 4, 45 Boucher, L. and Dienes, Z. (2003) Two ways of learning associations.
273–284 Cogn. Sci. 27, 807–842
46 Giroux, I. and Rey, A. (2005) Word and sub-word units in speech
16 Hunt, R.H. and Aslin, R.N. (2001) Statistical learning in a serial
perception. Proceedings of the 46th Annual Meeting of the Psycho-
reaction time task: Access to separable statistical cues by individual
nomic Society (Toronto), Abstract No. 3061
learners. J. Exp. Psychol. Gen. 130, 658–680
47 Kuhn, G. and Dienes, Z. Implicit learning of non-local musical rules. J.
17 Kirkham, N.Z. et al. (2002) Visual statistical learning in infancy:
Exp. Psychol. Learn. Mem. Cogn. (in press)
Evidence for a domain general learning mechanism. Cognition 83,
48 Perruchet, P. (2005) Statistical approaches to language acquisition
B35–B42
and the self-organizing consciousness: A reversal of perspective.
18 Shanks, D.R. et al. (2005) Attentional load and implicit sequence
Psychol. Res. 69, 316–329
learning. Psychol. Res. 69, 369–382 49 Perruchet, P. and Vinter, A. (2002) The self-organizing consciousness.
19 Remillard, G. (2003) Pure perceptual-based sequence learning. J. Exp. Behav. Brain Sci. 25, 297–388
Psychol. Learn. Mem. Cogn. 29, 581–597 50 Marcus, G.F. et al. (1999) Rule learning by seven-month-old infants.
20 Jiang, Y. and Chun, M.M. (2001) Selective attention modulates Science 283, 77–80
implicit learning. Q. J. Exp. Psychol. 54A, 1105–1124 51 Dienes, Z. and Altmann, G. (1997) Transfer of implicit knowledge
21 Jiang, Y. and Leung, A-W. (2005) Implicit learning of ignored visual across domains: How implicit and how abstract?. In How Implicit is
context. Psychon. Bull. Rev. 12, 100–106 Implicit Learning? (Berry, D., ed.), pp. 107–123, Oxford University
22 Hoffmann, J. and Sebald, A. (2005) When obvious covariations are not Press
even learned implicitly. Eur. J. Cog. Psychol 17, 449–480 52 Pacton, S. et al. (2001) Implicit learning out of the lab: The case of
23 Shanks, D.R. (2003) Attention and awareness in ‘implicit’ sequence orthographic regularities. J. Exp. Psychol. Gen. 130, 401–426
learning. In Attention and Implicit Learning (Jiménez, L., ed.), pp. 53 Pacton, S. et al. (2005) Children’s implicit learning of graphotactic and
11–42, John Benjamins morphological regularities. Child Dev. 76, 324–339
24 Hsiao, A.T. and Reber, A.S. (1998) The role of attention on implicit 54 Vokey, J.R. and Higham, P.A. (2005) Abstract analogies and positive
sequence learning. In Handbook of Implicit Learning (Stadler, M.A. transfer in artificial grammar learning. Can. J. Exp. Psychol. 59,
and Frensch, P., eds), pp. 471–494, Sage Publications 54–61
25 Toro, J.M. et al. (2005) Speech segmentation by statistical learning 55 Gomez, R.L. (1997) Transfer and complexity in artificial grammar
depends on attention. Cognition 97, B25–B34 learning. Cogn. Psychol. 33, 154–207
26 Baker, C.I. et al. (2004) Role of attention and perceptual grouping in 56 Redington, M. and Chater, N. (2002) Knowledge representation and
visual statistical learning. Psychol. Sci. 15, 460–466 transfer in artificial grammar learning. In Implicit Learning and
27 Pothos, E.M. and Bailey, T.M. (2000) The role of similarity in artificial Consciousness (French, R. and Cleeremans, A., eds), pp. 121–143,
grammar learning. J. Exp. Psychol. Learn. Mem. Cogn. 26, 847–862 Psychology Press
28 Buchner, A. et al. (1998) On the role of fragmentary knowledge in a 57 Gomez, R.L. et al. (2000) The basis of transfer in artificial grammar
sequence learning task. Q. J. Exp. Psychol. 51A, 251–281 learning. Mem. Cogn. 28, 253–263
29 Christiansen, M.H. et al. (1998) Learning to segment speech using 58 Tunney, R.J. and Altmann, G.T.M. (2001) Two modes of transfer in
multiple cues: A connectionist model. Lang. Cogn. Processes 13, artificial grammar learning. J. Exp. Psychol. Learn. Mem. Cogn. 27,
221–268 614–639
30 Perruchet, P. and Vinter, A. (1998) PARSER: A model for word 59 Perruchet, P. et al. (2002) The formation of structurally relevant units
segmentation. J. Mem. Lang. 39, 246–263 in artificial grammar learning. Q. J. Exp. Psychol. 55A, 485–503
60 Gomez, R. (2002) Variability and detection of invariant structure.
31 Cleeremans, A. and McClelland, J.L. (1991) Learning the structure of
Psychol. Sci. 13, 431–436
event sequences. J. Exp. Psychol. Gen. 120, 235–253
61 Newport, E.L. and Aslin, R.N. (2004) Learning at a distance: I.
32 Cleeremans, A. (1993) Mechanims of Implicit Learning: A Connec-
Statistical learning of non-adjacent dependencies. Cogn. Psychol. 48,
tionnist Model of Sequence Processing, MIT Press
127–162
33 Kinder, A. and Shanks, D.R. (2003) Neuropsychological dissociations
62 Onnis, L. et al. (2005) Phonology impacts segmentation in online
between priming and recognition: a single-system connectionist
speech processing. J. Mem. Lang. 53, 225–237
account. Psychol. Rev. 110, 728–744
63 Perruchet, P. et al. (2004) Learning non-adjacent dependencies: No
34 Tillmann, B. et al. (2000) Implicit learning of tonality: A self-
need for algebraic-like computations. J. Exp. Psychol. Gen. 133,
organizing approach. Psychol. Rev. 107, 885–913 573–583
35 Saffran, J.R. (2001) Words in a sea of sounds: The output of statistical 64 Creel, S.C. et al. (2004) Distant melodies: Statistical learning of
learning. Cognition 81, 149–169 nonadjacent dependencies in tone sequences. J. Exp. Psychol. Learn.
36 Meulemans, T. and Van der Linden, M. (2003) Implicit learning of Mem. Cogn. 30, 1119–1130
complex information in amnesia. Brain Cogn. 52, 250–257 65 Dienes, Z. and Longuet-Higgins, C. (2004) Can musical transform-
37 Jimenez, L. (2005) Chunk structure in implicit and explicit sequence ations be implicitly learned? Cogn. Sci. 28, 531–558
learning, 2nd European Workshop on Movement Science (Vienna), 66 Gomez, R.L. and Maye, J. (2005) The developmental trajectory of
Abstract No. 2.1.6 nonadjacent dependency learning. Infancy 7, 183–206
38 Anderson, J.R. and Lebiere, C. (1998) The Atomic Components of 67 Poletiek, F.H. (2002) Implicit learning of a recursive rule in an
Thought, Erlbaum artificial grammar. Acta Psychol. (Amst.) 111, 323–335
39 Shanks, D.R. et al. Disruption of sequential priming in organic and 68 Perruchet, P. and Rey, A. (2005) Does the mastery of center-embedded
pharmacological amnesia: A role for the medial temporal lobes in linguistic structures distinguish humans from nonhuman primates?
implicit contextual learning. Neuropsychopharmacology (in press) Psychonomic Bull. Rev. 12, 307–313

www.sciencedirect.com

You might also like