Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1999
Contrary to a recent claim that neural network models are unable to account for data on infant habituation to artificial language sentences, the present simulations show successful coverage with cascade-correlation networks using analog encoding. The results demonstrate that a symbolic rule-based account is not required by the infant data. One of the fundamental issues of cognitive science continues to revolve around which type of theoretical model better accounts for human cognition -- a symbolic rulebased account or a sub-symbolic neural network account. A recent study of infant habituation to expressions in an artificial language claims to have struck a damaging blow to the neural network approach (Marcus, Vijayan, Rao, & Vishton, 1999). The results of their study show that 7month-old infants attend longer to sentences with unfamiliar structures than to sentences with familiar structures. Because of certain features of their experimental design and their own unsuccessful neural n...
Infancy, 2001
A fundamental issue in cognitive science is whether human cognitive processing is better explained by symbolic rules or by subsymbolic neural networks. A recent study of infant familiarization to sentences in an artificial language seems to have produced data that can only be explained by symbolic rule learning and not by unstructured neural networks (Marcus, Vijayan, Bandi Rao, & Vishton, 1999). Here we present successful unstructured neural network simulations of the infant data, showing that these data do not uniquely support a rule-based account. In contrast to other simulations of these data, these simulations cover more aspects of the data with fewer assumptions about prior knowledge and training, using a more realistic coding scheme based on sonority of phonemes. The networks show exponential decreases in attention to a repeated sentence pattern, more recovery to novel sentences inconsistent with the familiar pattern than to novel sentences consistent with the familiar pattern. occasional familiarity preferences, more recovery to consistent novel sentences than to familiarized sentences, and extrapolative generalization outside the range of the training patterns. A variety of predictions suggest the utility of the model in guiding future
Developmental Science, 2000
This paper reviews a recent article suggesting that infants use a system of algebraic rules to learn an artificial grammar (Marcus, Vijayan, Bandi Rao & Vishton, Rule learning by seven-month-old infants. Science, 183 (1999), 77 ±80). In three reported experiments, infants exhibited increased responding to auditory strings that violated the pattern of elements they were habituated to. We argue that a perceptual interpretation is more parsimonious, as well as more consistent with a broad array of habituation data, and we report successful neural network simulations that implement this lower-level interpretation. In the discussion, we discuss how our model relates to other habituation research, and how it compares to other neural network models of habituation in general, and models of the Marcus et al. (1999) task specifically.
Proceedings of the Twenty First Annual Conference of the Cognitive Science Society, 2020
Recent studies have shown that infants have access to what would seem to be highly useful language acquisition skills. On the one hand, they can segment a stream of unmarked syllables into words, based only on the statistical regularities present in it. On the other, they are able to abstract beyond these input-specific regularities and generalize to rules. It has been argued that these are two separate learning mechanisms, that the former is simply associationist whereas the latter requires variables. In this paper we present a neural network model, demonstrating that when a network is made out of the right stuff, specifically, when it has the ability to represent sameness and the ability to represent relations, a simple associationist learning mechanism suffices to perform both of these tasks.
Science, 1999
Early Word Learning, 2017
Computational models are a means to develop explanations for the mechanisms underlying human behavior and behavioral change. Specifically, one type of computational models, artificial neural networks, has been used widely in modeling children's language and cognitive development. These models can learn from their experience with an environment and are sensitive to the environment's statistical structure, making them ideally suited to investigating how statistical learning can account for aspects of word learning across all levels, from the earliest phoneme acquisition to the development of the bilingual lexicon. Here we first describe the general principles underlying artificial neural network models with a goal of making them accessible to readers without experience with computational modeling. We then review the most common model architectures that have been used in simulating children's word learning in the broad context established in the other chapters of this volume. Finally, we review a number of specific models of word learning and discuss their contributions to our understanding of the mechanisms underlying early word learning, and the factors that shape this process in infants and toddlers.
2019
Artificial neural network models (also known as Parallel Distributed Processing or Connectionist models) have been highly influential in cognitive science since the mid-1980s. The original inspiration for these systems comes from information processing in the brain, which emerges from a large number of (nearly) identical, simple processing units (neurons) that are interconnected into a network. Each unit receives activation from other units or by stimulation from the external world, and generates an output activation that is a function of the total input activation received. The unit then feeds the output activation onward to the units to which it is connected. Information processing is thus implemented in terms of activation flowing through this network. Each connection between two units has a weight that determines how strongly the first unit affects the second. These weights can be adapted, which constitutes learning, or "training" as it is commonly known in the neural network literature. Algorithms for network training can be roughly divided into supervised and unsupervised methods. Supervised training is applied when a specific and known input-to-output mapping is required (e.g., learning to transform orthographic to phonological representations). To accomplish this, the network is provided with a representative set of "training examples" of inputs and the corresponding target outputs. It then processes each example and the difference between the networks' actual output and the target output leads to an update of the connection weights such that, next time, the output error will be smaller. By far the best known and most used method for supervised training is the Backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986) that makes the network's output activations for the training examples gradually converge toward the target outputs. Unsupervised training, in contrast, makes the network adapt to (aspects of) the statistical structure of input examples without mapping to target outputs (e.g., discovery of regularities in the phonological structure of language). These networks are well-suited to uncovering statistical structure present in the environment without requiring the modeller being aware what the structure is. One well-known example of an unsupervised training method is the learning rule proposed by Hebb (1949): Strengthen connections between units that are simultaneously active and weaken the connections between two units if only one is active. In spite of the superficial similarities between artificial and biological neural networks (i.e., interconnectivity and stimulation passing between neurons to determine their activation, and learning by adaptation of connection strengths), these cognitive models are not usually claimed to simulate processing at the level of biological neurons. Rather, neural network models form a description at Marr's (1982) algorithmic level, that is, they specify cognitive representations and operations while ignoring the biological implementation. Neural networks underwent a surge of popularity in the 1990s, but from the early 21st century they were somewhat overshadowed by symbolic probabilistic models. However, neural networks have enjoyed a recent revival partly due to the success of "deep learning" models, which display state-of-the-art performance on a wide range of artificial intelligence tasks (LeCun, Bengio, & Hinton, 2015). For the most part, the field of cognitive modelling is still to catch up with these novel developments. Consequently, the currently most influential connectionist cognitive models are of the more traditional variety. We return to this issue in the Conclusion. 1.1. Feedforward and recurrent networks Connectionist models are not amorphous networks in which everything is connected to everything else. Rather, a particular structure is imposed, for example by grouping units into a number of layers and allowing activation to flow only from each layer to the next. The first layer receives inputs from the environment, the final layer produces the corresponding output, and any intermediate layer is known as "hidden". Although this so-called "feedforward" architecture can (at least in theory) approximate any computable input-to-output function, it is unable to handle input that comes in over time. This is because the network has no working memory: Each input is immediately overwritten by the next. Hence, the feedforward network is not the most appropriate model for simulating language processing, which is a fundamentally temporal phenomenon. Elman (1990), in his seminal paper "Discovering structure in time", proposed a solution: Include a set of recurrent connections with trainable weights that link each unit of the single hidden layer to all hidden-layer units. Consequently, the hidden layer receives both the current environmental input and its own previous activation state which, in turn, depends on the state before that, etc. In this manner, the model is equipped with a working memory and can therefore encode sequential information, or "structure in time", making it well-suited to processing language as it unfolds over time. This particular architecture became known as the Simple Recurrent Network (SRN) but forms part of a larger class of Recurrent Neural Networks (RNNs) that have connections through which (part of) the network's current activation feeds back to the network itself. 1.2. Neural network models and linguistic theory Connectionist models of language acquisition and processing offer a view of the human language system that is very different from traditional, symbolic models in cognition. For one, neural networks do not distinguish competence (i.e., language knowledge) from performance (i.e., language behaviour). Instead, knowledge becomes instantiated in network connection weights in order for the network to display particular performance. In a sense, it forms procedural rather than declarative knowledge: it is know-how, not know-that. Hence, there is no way for the network to assess its own knowledge. As Clark and Karmiloff-Smith (1993, p. 495) put it: "it is knowledge in the system, but it is not yet knowledge to the system." Second, language researchers from the nativist tradition have famously argued that infants must possess innate, language-specific knowledge or learning mechanisms, because otherwise language acquisition in the absence of negative evidence would be impossible (e.g., Chomsky, 1965; Gold, 1967; Pinker, 1989; among many others). In contrast, empiricists claim that language acquisition requires only domain-general mechanisms. Connectionism falls squarely into the empiricist camp because the representations and learning mechanisms built into neural networks are not specific to language and the networks receive no negative evidence during training. Hence, successful neural network learning of (relevant aspects of) syntax would undermine the nativist position. A third major difference with traditional linguistic thinking is that neural networks do not represent discrete categories (be it phonemes, words, parts-of-speech, or any other category) unless these are explicitly assigned to the network's units a priori. However, in most (and, arguably, the most insightful) models, representations are learned in the hidden layer(s) rather than assigned,
2006
In language acquisition theory a crucial question centers on the degree of innate specialization for language learning. Over the past decade the importance of the ability to extract statistical information in both linguistic and non-linguistic domains has received considerable attention among linguists and cognitive scientists . It is also well known that language acquisition must involve more than just extracting co-occurrence frequencies between items. propose that there is also a mechanism designed to extract abstract, "algebraic" rules from linguistic data, though to date there has been no published studies examining this mechanism in non-linguistic domains. This study sought to replicate the findings of Marcus et al. with non-linguistic auditory and visual input. Results from three experiments show that 8-month-old infants are able to learn such rules from both linguistic and non-linguistic stimuli. This is taken as evidence that a rule abstraction mechanism of the kind proposed by Marcus et al. is part of the larger repertoire of domain-general learning mechanisms.
2005
Computer simulations show that an unstructured neuralnetwork model (Shultz & Bale, 2001) covers the essential features of infant differentiation of simple grammars in an artificial language, and generalizes by both extrapolation and interpolation. Other simulations (Vilcu & Hadley, 2003) claiming to show that this model did not really learn these grammars were flawed by confounding syntactic patterns with other factors and by lack of statistical significance testing. Thus, this model remains a viable account of infant ability to learn and discriminate simple syntactic structures. One of the enduring debates in cognitive science concerns the proper theoretical account for human cognition. Should cognition be interpreted in terms of symbolic rules or subsymbolic neural networks? It has been argued that infants’ ability to distinguish one syntactic pattern from another could only be explained by a symbolic rule-based account (Marcus, Vijayan, Rao, & Vishton, 1999). After being familiar...
2000
Well before their érst birthday, babies can acquire knowledge of serial order relations (Saffran et al., 1996a), as well as knowledge of more abstract rule- based structural relations (Marcus et al., 1999) between neighbouringspeech sounds within 2 minutes of exposure. These early learners can likewise acquire knowledge of rhythmic or temporal structure of a new language within 5-10 minutes of
Cognitive Science, 2011
Some empirical evidence in the artificial language acquisition literature has been taken to suggest that statistical learning mechanisms are insufficient for extracting structural information from an artificial language. According to the more than one mechanism (MOM) hypothesis, at least two mechanisms are required in order to acquire language from speech: (a) a statistical mechanism for speech segmentation; and (b) an additional rule-following mechanism in order to induce grammatical regularities. In this article, we present a set of neural network studies demonstrating that a single statistical mechanism can mimic the apparent discovery of structural regularities, beyond the segmentation of speech. We argue that our results undermine one argument for the MOM hypothesis.
Trends in Cognitive Sciences, 2000
68 Clark, R.E. and Squire, L.R. (1998) Classical conditioning and brain systems: the role of awareness. Science 280, 77-81 69 Squire, L.R. (1992) Declarative and non-declarative memory: multiple brain systems supporting learning and memory.
Recent studies have shown that infants have access to highly useful language acquisition skills. On the one hand, they can segment a stream of unmarked syllables into words, based only on the statistical regularities present in it. On the other, they can abstract beyond these input-specific regularities and generalize to rules. It has been argued that these are two separate learning mechanisms, that the former is simply associationist whereas the latter requires variables. In this paper we present a correlational approach to the learning of sequential regularities , and its implementation in a connectionist model, which accommodates both types of learning. We show that when a network is made out of the right stuff, specifically, when it has the ability to represent sameness and the ability to represent relations, a simple correlational learning mechanism suffices to perform both of these tasks. Crucially the model makes different predictions than the variable-based account.
Developmental Science, 2022
We conducted a close replication of the seminal work by Marcus and colleagues from 1999, which showed that after a brief auditory exposure phase, 7‐month‐old infants were able to learn and generalize a rule to novel syllables not previously present in the exposure phase. This work became the foundation for the theoretical framework by which we assume that infants are able to learn abstract representations and generalize linguistic rules. While some extensions on the original work have shown evidence of rule learning, the outcomes are mixed, and an exact replication of Marcus et al.'s study has thus far not been reported. A recent meta‐analysis by Rabagliati and colleagues brings to light that the rule‐learning effect depends on stimulus type (e.g., meaningfulness, speech vs. nonspeech) and is not as robust as often assumed. In light of the theoretical importance of the issue at stake, it is appropriate and necessary to assess the replicability and robustness of Marcus et al.'...
One of the most controversial issues in cognitive science pertains to whether rules are necessary to explain complex behavior. Nowhere has the debate over rules been more heated than within the field of language acquisition. Most researchers agree on the need for statistical learning mechanisms in language acquisition, but disagree on whether rule-learning components are also needed. Marcus, Vijayan, Rao, & Vishton (1999) have provided evidence of rule-like behavior which they claim can only be explained by a dual-mechanism account. In this paper, we show that a connectionist singlemechanism approach provides a more parsimonious account of rule-like behavior in infancy than the dual-mechanism approach. Specifically, we present simulation results from an existing connectionist model of infant speech segmentation, fitting the behavioral data under naturalistic circumstances without invoking rules. We further investigate diverging predictions from the single-and dual-mechanism accounts through additional simulations and artificial language learning experiments. The results support a connectionist single-mechanism account, while undermining the dual-mechanism account.
Neural Computation, 2002
A simple associationist neural network learns to factor abstract rules (i.e., grammars) from sequences of arbitrary input symbols by inventing abstract representations that accommodate unseen symbol sets as well as unseen but similar grammars. The neural network is shown to have the ability to transfer grammatical knowledge to both new symbol vocabularies and new grammars. Analysis of the state-space shows that the network learns generalized abstract structures of the input and is not simply memorizing the input strings. These representations are context sensitive, hierarchical, and based on the state variable of the finite-state machines that the neural network has learned. Generalization to new symbol sets or grammars arises from the spatial nature of the internal representations used by the network, allowing new symbol sets to be encoded close to symbol sets that have already been learned in the hidden unit space of the network. The results are counter to the arguments that learn...
2009
According to the "dual-mechanism" hypothesis, the induction of structural information correlates negatively with familiarization length, in such a way that two mechanisms are required in order to analyze speech: (1) statistical computations based on nonadjacent transitional probabilities, and (2) an additional rule-following mechanism. The argument is that, although statistical mechanisms may suffice for speech segmentation, an additional rule-following mechanism is required in order to quickly extract structural information. We present a set of neural network studies that shows how a single statistical mechanism can mimic the apparent discovery of structural regularities, beyond the segmentation of speech. We argue that our results undermine one argument for the dual-mechanism hypothesis.
2015
How many mechanisms? 2 2 According to the “dual-mechanism ” hypothesis, the induction of structural information correlates negatively with familiarization length, in such a way that two mechanisms are required in order to analyze speech: (1) statistical computations based on nonadjacent transitional probabilities, and (2) an additional rule-following mechanism. The argument is that, although statistical mechanisms may suffice for speech segmentation, an additional rule-following mechanism is required in order to quickly extract structural information. We present a set of neural network studies that shows how a single statistical mechanism can mimic the apparent discovery of structural regularities, beyond the segmentation of speech. We argue that our results undermine one argument for the dual-mechanism hypothesis.
Scientific Reports
Infants readily extract linguistic rules from speech. Here, we ask whether this advantage extends to linguistic stimuli that do not rely on the spoken modality. To address this question, we first examine whether infants can differentially learn rules from linguistic signs. We show that, despite having no previous experience with a sign language, six-month-old infants can extract the reduplicative rule (AA) from dynamic linguistic signs, and the neural response to reduplicative linguistic signs differs from reduplicative visual controls, matched for the dynamic spatiotemporal properties of signs. We next demonstrate that the brain response for reduplicative signs is similar to the response to reduplicative speech stimuli. Rule learning, then, apparently depends on the linguistic status of the stimulus, not its sensory modality. These results suggest that infants are language-ready. They possess a powerful rule system that is differentially engaged by all linguistic stimuli, speech or...
Proceedings of CogSci, 2009
We present a biologically inspired computational framework for language processing and grammar acquisition, called the hierarchical prediction network (HPN). HPN fits in the tradi-tion of connectionist models, but it extends their power by al-lowing for a substitution operation between the ...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.