Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, The 2011 International Joint Conference on Neural Networks
Deep belief networks (DBNs) are popular for learning compact representations of highdimensional data. However, most approaches so far rely on having a single, complete training set. If the distribution of relevant features changes during subsequent training stages, the features learned in earlier stages are gradually forgotten. Often it is desirable for learning algorithms to retain what they have previously learned, even if the input distribution temporarily changes. This paper introduces the M-DBN, an unsupervised modular DBN that addresses the forgetting problem. M-DBNs are composed of a number of modules that are trained only on samples they best reconstruct. While modularization by itself does not prevent forgetting, the M-DBN additionally uses a learning method that adjusts each module's learning rate proportionally to the fraction of best reconstructed samples. On the MNIST handwritten digit dataset module specialization largely corresponds to the digits discerned by humans. Furthermore, in several learning tasks with changing MNIST digits, M-DBNs retain learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before.
We show how to use "complementary priors" to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
2020
With the advent of deep learning, the number of works proposing new methods or improving existent ones has grown exponentially in the last years. In this scenario, “very deep” models were emerging, once they were expected to extract more intrinsic and abstract features while supporting a better performance. However, such models suffer from the gradient vanishing problem, i.e., backpropagation values become too close to zero in their shallower layers, ultimately causing learning to stagnate. Such an issue was overcome in the context of convolution neural networks by creating “shortcut connections” between layers, in a so-called deep residual learning framework. Nonetheless, a very popular deep learning technique called Deep Belief Network still suffers from gradient vanishing when dealing with discriminative tasks. Therefore, this paper proposes the Residual Deep Belief Network, which considers the information reinforcement layer-by-layer to improve the feature extraction and knowled...
Lecture Notes in Computer Science, 2011
Deep belief network (DBN) is a probabilistic generative model with multiple layers of hidden nodes and a layer of visible nodes, where parameterizations between layers obey harmonium or restricted Boltzmann machines (RBMs). In this paper we present restricted deep belief network (RDBN) for multi-view learning, where each layer of hidden nodes is composed of view-specific and shared hidden nodes, in order to learn individual and shared hidden spaces from multiple views of data. View-specific hidden nodes are connected to corresponding view-specific hidden nodes in the lower-layer or visible nodes involving a specific view, whereas shared hidden nodes follow inter-layer connections without restrictions as in standard DBNs. RDBN is trained using layer-wise contrastive divergence learning. Numerical experiments on synthetic and real-world datasets demonstrate the useful behavior of the RDBN, compared to the multi-wing harmonium (MWH) which is a two-layer undirected model.
Applied Sciences
Learning to recognize a new object after having learned to recognize other objects may be a simple task for a human, but not for machines. The present go-to approaches for teaching a machine to recognize a set of objects are based on the use of deep neural networks (DNN). So, intuitively, the solution for teaching new objects on the fly to a machine should be DNN. The problem is that the trained DNN weights used to classify the initial set of objects are extremely fragile, meaning that any change to those weights can severely damage the capacity to perform the initial recognitions; this phenomenon is known as catastrophic forgetting (CF). This paper presents a new (DNN) continual learning (CL) architecture that can deal with CF, the modular dynamic neural network (MDNN). The presented architecture consists of two main components: (a) the ResNet50-based feature extraction component as the backbone; and (b) the modular dynamic classification component, which consists of multiple sub-n...
2016
Object detection and recognition are important problems in computer vision and pattern recognition domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on computer based systems has proved to be a non-trivial task. In particular, despite significant research efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition systems in real time remain elusive. Here we present a survey of one particular approach that has proved very promising for invariant feature recognition and which is a key initial stage of multi-stage network architecture methods for the high level task of object recognition.
arXiv: Learning, 2019
We present a novel adversarial framework for training deep belief networks (DBNs), which includes replacing the generator network in the methodology of generative adversarial networks (GANs) with a DBN and developing a highly parallelizable numerical algorithm for training the resulting architecture in a stochastic manner. Unlike the existing techniques, this framework can be applied to the most general form of DBNs with no requirement for back propagation. As such, it lays a new foundation for developing DBNs on a par with GANs with various regularization units, such as pooling and normalization. Foregoing back-propagation, our framework also exhibits superior scalability as compared to other DBN and GAN learning techniques. We present a number of numerical experiments in computer vision as well as neurosciences to illustrate the main advantages of our approach.
IEEE Transactions on Audio, Speech, and Language Processing, 2000
Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters. These networks are first pre-trained as a multilayer generative model of a window of spectral feature vectors without making use of any discriminative information. Once the generative pre-training has designed the features, we perform discriminative fine-tuning using backpropagation to adjust the features slightly to make them better at predicting a probability distribution over the states of monophone hidden Markov models.
NIPS Workshop on Deep Learning for …, 2009
Hidden Markov Models (HMMs) have been the state-of-the-art techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable of modeling the many types of variability present in the speech generation process. Deep Belief Networks (DBNs) have recently proved to be very effective for a variety of machine learning problems and this paper applies DBNs to acoustic modeling. On the standard TIMIT corpus, DBNs consistently outperform other techniques and the best DBN achieves a phone error rate (PER) of 23.0% on the TIMIT core test set.
PLoS ONE, 2014
With the goal of understanding behavioral mechanisms of generalization, we analyzed the ability of neural networks to generalize across context. We modeled a behavioral task where the correct responses to a set of specific sensory stimuli varied systematically across different contexts. The correct response depended on the stimulus (A,B,C,D) and context quadrant (1,2,3,4). The possible 16 stimulus-context combinations were associated with one of two responses (X,Y), one of which was correct for half of the combinations. The correct responses varied symmetrically across contexts. This allowed responses to previously unseen stimuli (probe stimuli) to be generalized from stimuli that had been presented previously. By testing the simulation on two or more stimuli that the network had never seen in a particular context, we could test whether the correct response on the novel stimuli could be generated based on knowledge of the correct responses in other contexts. We tested this generalization capability with a Deep Belief Network (DBN), Multi-Layer Perceptron (MLP) network, and the combination of a DBN with a linear perceptron (LP). Overall, the combination of the DBN and LP had the highest success rate for generalization.
Deep belief network (DBN) has become one of the most important models in deep learning, however, the un-optimized structure leads to wasting too much training resources. To solve this problem and to investigate the connection of depth and accuracy of DBN, an optimization training method that consists of two steps is proposed. Firstly, by using mathematical and biological tools, the significance of supervised training is analyzed, and a theorem, that is on reconstruction error and network energy, is proved. Secondly, based on conclusions of step one, this paper proposes to optimize the structure of DBN (especially hidden layer numbers). Thirdly, this method is applied in two image recognition experiments, and results show increased computing efficiency and accuracies in both tasks.
Neural Computation, 2008
Deep Belief Networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton et al., along with a greedy layer-wise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a Restricted Boltzmann Machine (RBM), used to represent one layer of the model. Restricted Boltzmann Machines are interesting because inference is easy in them, and because they have been successfully used as building blocks for training deeper models. We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions. We then study the question of whether DBNs with more layers are strictly more powerful in terms of representational power. This suggests a new and less greedy criterion for training RBMs within DBNs.
ArXiv, 2019
Lifelong learning is a very important step toward realizing robust autonomous artificial agents. Neural networks are the main engine of deep learning, which is the current state-of-the-art technique in formulating adaptive artificial intelligent systems. However, neural networks suffer from catastrophic forgetting when stressed with the challenge of continual learning. We investigate how to exploit modular topology in neural networks in order to dynamically balance the information load between different modules by routing inputs based on the information content in each module so that information interference is minimized. Our dynamic information balancing (DIB) technique adapts a reinforcement learning technique to guide the routing of different inputs based on a reward signal derived from a measure of the information load in each module. Our empirical results show that DIB combined with elastic weight consolidation (EWC) regularization outperforms models with similar capacity and E...
2018
Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end. In contrast to existing approaches, training does not rely on regularization to enforce diversity in module use. We apply modular networks both to image recognition and language modeling tasks, where we achieve superior performance compared to several baselines. Introspection reveals that modules specialize in interpretable contexts.
2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009
In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state-of-the-art on handwritten digit recognition and pedestrian detection.
Entropy, 2016
Conventionally, the maximum likelihood (ML) criterion is applied to train a deep belief network (DBN). We present a maximum entropy (ME) learning algorithm for DBNs, designed specifically to handle limited training data. Maximizing only the entropy of parameters in the DBN allows more effective generalization capability, less bias towards data distributions, and robustness to over-fitting compared to ML learning. Results of text classification and object recognition tasks demonstrate ME-trained DBN outperforms ML-trained DBN when training data is limited.
Recently, the use of restricted Boltzmann machines has been considered in the construction of deep neural networks. One reason for this use is the feature engineering capability of the restricted Boltzmann machine. One of the issues facing the deep neural networks is weight training. Because of the complexity of training processes, these topics are of utmost importance in the deep networks. Based on the differences between the means and the means of all the values of features of training vectors, we have attempted in this paper to modify the initial weights in the restricted Boltzmann machine. By virtue of this, the probability of training vector reconstruction by the model is increased at the beginning of training processes. Subsequently, the error amount of the deep belief network in the training process is reduced. The reason for the use of this approach, is the consideration of common values of a feature with regard to the values of all features. Empirical experiments show p appropriate properties on the field.
2013
Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results.
Recent theoretical advances in the learning of deep artificial neural networks have made it possible to overcome a vanishing gradient problem. This limitation has been overcome using a pre-training step, where deep belief networks formed by the stacked Restricted Boltzmann Machines perform unsupervised learning. Once a pre-training step is done, network weights are fine-tuned using regular error back propagation while treating network as a feed-forward net. In the current paper we perform the comparison of described approach and commonly used classification approaches on some well-known classification data sets from the UCI repository as well as on one mid-sized proprietary data set.
2007
Abstract Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (eg low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.