Academia.eduAcademia.edu

Neural network models of language acquisition and processing

2019

Abstract

Artificial neural network models (also known as Parallel Distributed Processing or Connectionist models) have been highly influential in cognitive science since the mid-1980s. The original inspiration for these systems comes from information processing in the brain, which emerges from a large number of (nearly) identical, simple processing units (neurons) that are interconnected into a network. Each unit receives activation from other units or by stimulation from the external world, and generates an output activation that is a function of the total input activation received. The unit then feeds the output activation onward to the units to which it is connected. Information processing is thus implemented in terms of activation flowing through this network. Each connection between two units has a weight that determines how strongly the first unit affects the second. These weights can be adapted, which constitutes learning, or "training" as it is commonly known in the neural network literature. Algorithms for network training can be roughly divided into supervised and unsupervised methods. Supervised training is applied when a specific and known input-to-output mapping is required (e.g., learning to transform orthographic to phonological representations). To accomplish this, the network is provided with a representative set of "training examples" of inputs and the corresponding target outputs. It then processes each example and the difference between the networks' actual output and the target output leads to an update of the connection weights such that, next time, the output error will be smaller. By far the best known and most used method for supervised training is the Backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986) that makes the network's output activations for the training examples gradually converge toward the target outputs. Unsupervised training, in contrast, makes the network adapt to (aspects of) the statistical structure of input examples without mapping to target outputs (e.g., discovery of regularities in the phonological structure of language). These networks are well-suited to uncovering statistical structure present in the environment without requiring the modeller being aware what the structure is. One well-known example of an unsupervised training method is the learning rule proposed by Hebb (1949): Strengthen connections between units that are simultaneously active and weaken the connections between two units if only one is active. In spite of the superficial similarities between artificial and biological neural networks (i.e., interconnectivity and stimulation passing between neurons to determine their activation, and learning by adaptation of connection strengths), these cognitive models are not usually claimed to simulate processing at the level of biological neurons. Rather, neural network models form a description at Marr's (1982) algorithmic level, that is, they specify cognitive representations and operations while ignoring the biological implementation. Neural networks underwent a surge of popularity in the 1990s, but from the early 21st century they were somewhat overshadowed by symbolic probabilistic models. However, neural networks have enjoyed a recent revival partly due to the success of "deep learning" models, which display state-of-the-art performance on a wide range of artificial intelligence tasks (LeCun, Bengio, & Hinton, 2015). For the most part, the field of cognitive modelling is still to catch up with these novel developments. Consequently, the currently most influential connectionist cognitive models are of the more traditional variety. We return to this issue in the Conclusion. 1.1. Feedforward and recurrent networks Connectionist models are not amorphous networks in which everything is connected to everything else. Rather, a particular structure is imposed, for example by grouping units into a number of layers and allowing activation to flow only from each layer to the next. The first layer receives inputs from the environment, the final layer produces the corresponding output, and any intermediate layer is known as "hidden". Although this so-called "feedforward" architecture can (at least in theory) approximate any computable input-to-output function, it is unable to handle input that comes in over time. This is because the network has no working memory: Each input is immediately overwritten by the next. Hence, the feedforward network is not the most appropriate model for simulating language processing, which is a fundamentally temporal phenomenon. Elman (1990), in his seminal paper "Discovering structure in time", proposed a solution: Include a set of recurrent connections with trainable weights that link each unit of the single hidden layer to all hidden-layer units. Consequently, the hidden layer receives both the current environmental input and its own previous activation state which, in turn, depends on the state before that, etc. In this manner, the model is equipped with a working memory and can therefore encode sequential information, or "structure in time", making it well-suited to processing language as it unfolds over time. This particular architecture became known as the Simple Recurrent Network (SRN) but forms part of a larger class of Recurrent Neural Networks (RNNs) that have connections through which (part of) the network's current activation feeds back to the network itself. 1.2. Neural network models and linguistic theory Connectionist models of language acquisition and processing offer a view of the human language system that is very different from traditional, symbolic models in cognition. For one, neural networks do not distinguish competence (i.e., language knowledge) from performance (i.e., language behaviour). Instead, knowledge becomes instantiated in network connection weights in order for the network to display particular performance. In a sense, it forms procedural rather than declarative knowledge: it is know-how, not know-that. Hence, there is no way for the network to assess its own knowledge. As Clark and Karmiloff-Smith (1993, p. 495) put it: "it is knowledge in the system, but it is not yet knowledge to the system." Second, language researchers from the nativist tradition have famously argued that infants must possess innate, language-specific knowledge or learning mechanisms, because otherwise language acquisition in the absence of negative evidence would be impossible (e.g., Chomsky, 1965; Gold, 1967; Pinker, 1989; among many others). In contrast, empiricists claim that language acquisition requires only domain-general mechanisms. Connectionism falls squarely into the empiricist camp because the representations and learning mechanisms built into neural networks are not specific to language and the networks receive no negative evidence during training. Hence, successful neural network learning of (relevant aspects of) syntax would undermine the nativist position. A third major difference with traditional linguistic thinking is that neural networks do not represent discrete categories (be it phonemes, words, parts-of-speech, or any other category) unless these are explicitly assigned to the network's units a priori. However, in most (and, arguably, the most insightful) models, representations are learned in the hidden layer(s) rather than assigned,