Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
We propose a Multi-Layer Network based on the Bayesian framework of the Factor Graphs in Reduced Normal Form (FGrn) applied to a two-dimensional lattice. The Latent Variable Model (LVM) is the basic building block of a quadtree hierarchy built on top of a bottom layer of random variables that represent pixels of an image, a feature map, or more generally a collection of spatially distributed discrete variables. The multi-layer architecture implements a hierarchical data representation that, via belief propagation, can be used for learning and inference. Typical uses are pattern completion, correction and classification. The FGrn paradigm provides great flexibility and modularity and appears as a promising candidate for building deep networks: the system can be easily extended by introducing new and different (in cardinality and in type) variables. Prior knowledge, or supervised information, can be introduced at different scales. The FGrn paradigm provides a handy way for building all kinds of architectures by interconnecting only three types of units: Single Input Single Output (SISO) blocks, Sources and Replicators. The network is designed like a circuit diagram and the belief messages flow bidirectionally in the whole system. The learning algorithms operate only locally within each block. The framework is demonstrated in this paper in a three-layer structure applied to images extracted from a standard data set.
2014 4th International Workshop on Cognitive Information Processing (CIP), 2014
In modeling time series, convolution multi-layer graphs are able to capture long-term dependence at a gradually increasing scale. We present an approach to learn a layered factor graph architecture starting from a stationary latent models for each layer. Simulations of belief propagation are reported for a three-layer graph on a small data set of characters.
We propose rectified factor networks (RFNs) as generative unsupervised models, which learn robust, very sparse, and non-linear codes with many code units. RFN learning is a variational expectation maximization (EM) algorithm with unknown prior which includes (i) rectified posterior means, (ii) normalized signals of hidden units, and (iii) dropout. Like factor analysis, RFNs explain the data variance by their parameters. For pretraining of deep networks on MNIST, rectangle data, convex shapes, NORB, and CIFAR, RFNs were superior to restricted Boltzmann machines (RBMs) and denoising autoencoders. On CIFAR-10 and CIFAR-100, RFN pretraining always improved the results of deep networks for different architectures like AlexNet, deep supervised net (DSN), and a simple "Network In Network" architecture. With RFNs success is guaranteed.
We show how to use "complementary priors" to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
ArXiv, 2019
Thanks to the availability of large scale digital datasets and massive amounts of computational power, deep learning algorithms can learn representations of data by exploiting multiple levels of abstraction. These machine learning methods have greatly improved the state-of-the-art in many challenging cognitive tasks, such as visual object recognition, speech processing, natural language understanding and automatic translation. In particular, one class of deep learning models, known as deep belief networks, can discover intricate statistical structure in large data sets in a completely unsupervised fashion, by learning a generative model of the data using Hebbian-like learning mechanisms. Although these self-organizing systems can be conveniently formalized within the framework of statistical mechanics, their internal functioning remains opaque, because their emergent dynamics cannot be solved analytically. In this article we propose to study deep belief networks using techniques com...
IEEE transactions on neural networks and learning systems, 2019
This paper addresses the duality between deterministic feed-forward neural networks (FF-NNs), and linear Bayesian networks (LBNs), which are generative stochastic models representing probability distributions over the visible data based on a linear function of a set of latent (hidden) variables. The maximum entropy principle is used to define a unique generative model corresponding to each FF-NN, called projected belief network (PBN). The FF-NN exactly recovers the hidden variables of the dual PBN. The large-◆ asymptotic approximation to the PBN has the familiar structure of an LBN, with the addition of an invertible non-linear transformation operating on the latent variables. It is shown that the exact nature of the PBN depends on the range of the input (visible) data-details for three cases of input data range are provided. The likelihood function of the PBN is straightforward to calculate, allowing it to be used as a generative classifier. An example is provided in which a generative classifier based on the PBN has comparable performance to a deep belief network in classifying handwritten characters. In addition, several examples are provided that demonstrate the duality relationship, for example by training networks from either side of the duality.
Artificial neural networks are powerful pattern classifiers; however, they have been surpassed in accuracy by methods such as support vector machines and random forests that are also easier to use and faster to train. Backpropagation, which is used to train artificial neural networks, suffers from the herd effect problem which leads to long training times and limit classification accuracy. We use the disjunctive normal form and approximate the boolean conjunction operations with products to construct a novel network architecture. The proposed model can be trained by minimizing an error function and it allows an effective and intuitive initialization which solves the herdeffect problem associated with backpropagation. This leads to state-of-the art classification accuracy and fast training times. In addition, our model can be jointly optimized with convolutional features in an unified structure leading to state-of-the-art results on computer vision problems with fast convergence rates. A GPU implementation of LDNN with optional convolutional features is also available
Computer Vision – ECCV 2018, 2018
Efficient CNN designs like ResNets and DenseNet were proposed to improve accuracy vs efficiency trade-offs. They essentially increased the connectivity, allowing efficient information flow across layers. Inspired by these techniques, we propose to model connections between filters of a CNN using graphs which are simultaneously sparse and well connected. Sparsity results in efficiency while well connectedness can preserve the expressive power of the CNNs. We use a well-studied class of graphs from theoretical computer science that satisfies these properties known as Expander graphs. Expander graphs are used to model connections between filters in CNNs to design networks called X-Nets. We present two guarantees on the connectivity of X-Nets: Each node influences every node in a layer in logarithmic steps, and the number of paths between two sets of nodes is proportional to the product of their sizes. We also propose efficient training and inference algorithms, making it possible to train deeper and wider X-Nets effectively. Expander based models give a 4% improvement in accuracy on Mo-bileNet over grouped convolutions, a popular technique, which has the same sparsity but worse connectivity. X-Nets give better performance trade-offs than the original ResNet and DenseNet-BC architectures. We achieve model sizes comparable to state-of-the-art pruning techniques using our simple architecture design, without any pruning. We hope that this work motivates other approaches to utilize results from graph theory to develop efficient network architectures.
IEEE Transactions on Neural Networks and Learning Systems, 2016
Developments in deep learning have seen the use of layer-wise unsupervised learning combined with supervised learning for fine-tuning. With this layer-wise approach, a deep network can be seen as a more modular system which lends itself well to learning representations. In this paper we investigate whether such modularity can be useful to the insertion of background knowledge into deep networks, and whether it can improve learning performance when it is available, and to the extraction of knowledge from trained deep networks, and whether it can offer a better understanding of the representations learned by such networks. To this end we use a simple symbolic language-a set of logical rules which we call confidence rulesand show that it is suitable for the representation of quantitative reasoning in deep networks. We show by knowledge extraction that confidence rules can offer a low-cost representation for layerwise networks (or restricted Boltzmann machines). We also show that layer-wise extraction can produce an improvement in the accuracy of Deep Belief Networks. Furthermore, the proposed symbolic characterisation of deep networks provides a novel method for the insertion of prior knowledge and training of deep networks. With the use of this method, a deep neuralsymbolic system is proposed and evaluated, with experimental results indicating that modularity through the use of confidence rules and knowledge insertion can be beneficial to network performance.
ArXiv, 2020
We introduce factorize sum split product networks (FSPNs), a new class of probabilistic graphical models (PGMs). FSPNs are designed to overcome the drawbacks of existing PGMs in terms of estimation accuracy and inference efficiency. Specifically, Bayesian networks (BNs) have low inference speed and performance of tree structured sum product networks(SPNs) significantly degrades in presence of highly correlated variables. FSPNs absorb their advantages by adaptively modeling the joint distribution of variables according to their dependence degree, so that one can simultaneously attain the two desirable goals: high estimation accuracy and fast inference speed. We present efficient probability inference and structure learning algorithms for FSPNs, along with a theoretical analysis and extensive evaluation evidence. Our experimental results on synthetic and benchmark datasets indicate the superiority of FSPN over other PGMs.
arXiv: Learning, 2019
We present a novel adversarial framework for training deep belief networks (DBNs), which includes replacing the generator network in the methodology of generative adversarial networks (GANs) with a DBN and developing a highly parallelizable numerical algorithm for training the resulting architecture in a stochastic manner. Unlike the existing techniques, this framework can be applied to the most general form of DBNs with no requirement for back propagation. As such, it lays a new foundation for developing DBNs on a par with GANs with various regularization units, such as pooling and normalization. Foregoing back-propagation, our framework also exhibits superior scalability as compared to other DBN and GAN learning techniques. We present a number of numerical experiments in computer vision as well as neurosciences to illustrate the main advantages of our approach.
Neural Networks, 2005
Machine learning methods that can handle variable-size structured data such as sequences and graphs include Bayesian networks (BNs) and Recursive Neural Networks (RNNs). In both classes of models, the data is modeled using a set of observed and hidden variables associated with the nodes of a directed acyclic graph. In BNs, the conditional relationships between parent and child variables are probabilistic whereas in RNNs they are deterministic and parameterized by neural networks. Here we study the formal relationship between both classes of models and show that when the source nodes variables are observed, RNNs can be viewed as limits, both in distribution and probability, of BNs with local conditional distributions that have vanishing covariance matrices and converge to delta functions. Conditions for uniform convergence are also given together with an analysis of the behavior and exactness of Belief Propagation (BP) in "deterministic" BNs. Implications for the design of mixed architectures and the corresponding inference algorithms are briefly discussed.
2016
Object detection and recognition are important problems in computer vision and pattern recognition domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on computer based systems has proved to be a non-trivial task. In particular, despite significant research efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition systems in real time remain elusive. Here we present a survey of one particular approach that has proved very promising for invariant feature recognition and which is a key initial stage of multi-stage network architecture methods for the high level task of object recognition.
2022
The projected belief network (PBN) is a layered generative network (LGN) with tractable likelihood function, and is based on a feed-forward neural network (FFNN). There are two versions of the PBN: stochastic and deterministic (D-PBN), and each has theoretical advantages over other LGNs. However, implementation of the PBN requires an iterative algorithm that includes the inversion of a symmetric matrix of size M × M in each layer, where M is the layer output dimension. This, and the fact that the network must be always dimension-reducing in each layer, can limit the types of problems where the PBN can be applied. In this paper, we describe techniques to avoid or mitigate these restrictions and use the PBN effectively at high dimension. We apply the discriminatively aligned PBN (PBN-DA) to classifying and auto-encoding high-dimensional spectrograms of acoustic events. We also present the discriminatively aligned D-PBN for the first time.
2021
Bayesian networks in their Factor Graph Reduced Normal Form are a powerful paradigm for implementing inference graphs. Unfortunately, the computational and memory costs of these networks may be considerable even for relatively small networks, and this is one of the main reasons why these structures have often been underused in practice. In this work, through a detailed algorithmic and structural analysis, various solutions for cost reduction are proposed. Moreover, an online version of the classic batch learning algorithm is also analysed, showing very similar results in an unsupervised context but with much better performance; which may be essential if multi-level structures are to be built. The solutions proposed, together with the possible online learning algorithm, are included in a C++ library that is quite efficient, especially if compared to the direct use of the well-known sum-product and Maximum Likelihood algorithms. The results obtained are discussed with particular refer...
Advances in neural information processing …, 2007
Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or "deep," structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of . We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear ("contour") features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex "corner" features matches well with the results from the Ito & Komatsu's study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features.
The 2011 International Joint Conference on Neural Networks, 2011
Deep belief networks (DBNs) are popular for learning compact representations of highdimensional data. However, most approaches so far rely on having a single, complete training set. If the distribution of relevant features changes during subsequent training stages, the features learned in earlier stages are gradually forgotten. Often it is desirable for learning algorithms to retain what they have previously learned, even if the input distribution temporarily changes. This paper introduces the M-DBN, an unsupervised modular DBN that addresses the forgetting problem. M-DBNs are composed of a number of modules that are trained only on samples they best reconstruct. While modularization by itself does not prevent forgetting, the M-DBN additionally uses a learning method that adjusts each module's learning rate proportionally to the fraction of best reconstructed samples. On the MNIST handwritten digit dataset module specialization largely corresponds to the digits discerned by humans. Furthermore, in several learning tasks with changing MNIST digits, M-DBNs retain learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before.
ArXiv, 2021
In recent years, GoogleNet has garnered substantial attention as one of the base convolutional neural networks (CNNs) to extract visual features for object detection. However, it experiences challenges of contaminated deep features when concatenating elements with different properties. Also, since GoogleNet is not an entirely lightweight CNN, it still has many execution overheads to apply to a resource-starved application domain. Therefore, a new CNNs, FactorNet, has been proposed to overcome these functional challenges. The FactorNet CNN is composed of multiple independent sub CNNs to encode different aspects of the deep visual features and has far fewer execution overheads in terms of weight parameters and floatingpoint operations. Incorporating FactorNet into the Faster-RCNN framework proved that FactorNet gives better accuracy at a minimum and produces additional speedup over GoolgleNet throughout the KITTI object detection benchmark data set in a real-time object detection system.
PhD Thesis - Cambridge, 2018
Neural Information Processing Systems, 1999
Local "belief propagation" rules of the sort proposed by Pearl [15] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation"using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to loopy belief propagation. Except for the case of graphs with a single loop, there has been little theoretical understanding of the performance of loopy propagation. Here we analyze belief propagation in networks with arbitrary topologies when the nodes in the graph describe jointly Gaussian random variables. We give an analytical formula relating the true posterior probabilities with those calculated using loopy propagation. We give sufficient conditions for convergence and show that when belief propagation converges it gives the correct posterior means for all graph topologies, not just networks with a single loop. The related "max-product" belief propagation algorithm finds the maximum posterior probability estimate for singly connected networks. We show that, even for non-Gaussian probability distributions, the convergence points of the max-product algorithm in loopy networks are maxima over a particular large local neighborhood of the posterior probability. These results help clarify the empirical performance results and motivate using the powerful belief propagation algorithm in a broader class of networks. Problems involving probabilistic belief propagation arise in a wide variety of applications, including error correcting codes, speech recognition and medical diagnosis. If the graph is singly connected, there exist local message-passing schemes to calculate the posterior probability of an unobserved variable given the observed variables. Pearl [15] derived such a scheme for singly connected Bayesian networks and showed that this "belief propagation" algorithm is guaranteed to converge to the correct posterior probabilities (or "beliefs"). Several groups have recently reported excellent experimental results by running algorithms
A generative model based on training deep archi-tectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EM-like algorithm to train the networks together and update the clusters of the data. We call this model Mixture of Networks. The provided model is a platform that can be used for any deep structure and be trained by any conventional objective function for distribution modeling. As the components of the model are neural networks, it has high capability in characterizing complicated data distributions as well as clustering data. We apply the algorithm on MNIST handwritten digits and Yale face datasets. We also demonstrate the clustering ability of the model using some real-world and toy examples.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.