Modular deep belief networks that do not forget

Mark Ring

Modular deep belief networks that do not forget

Mark Ring

2011, The 2011 International Joint Conference on Neural Networks

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Deep belief networks (DBNs) are popular for learning compact representations of highdimensional data. However, most approaches so far rely on having a single, complete training set. If the distribution of relevant features changes during subsequent training stages, the features learned in earlier stages are gradually forgotten. Often it is desirable for learning algorithms to retain what they have previously learned, even if the input distribution temporarily changes. This paper introduces the M-DBN, an unsupervised modular DBN that addresses the forgetting problem. M-DBNs are composed of a number of modules that are trained only on samples they best reconstruct. While modularization by itself does not prevent forgetting, the M-DBN additionally uses a learning method that adjusts each module's learning rate proportionally to the fraction of best reconstructed samples. On the MNIST handwritten digit dataset module specialization largely corresponds to the digits discerned by humans. Furthermore, in several learning tasks with changing MNIST digits, M-DBNs retain learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before.

JiaoKing Du

We show how to use "complementary priors" to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

Log In

Modular deep belief networks that do not forget

Sign up for access to the world's latest research

Abstract

Related papers

Related topics

Related papers