Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, arXiv (Cornell University)
The learned weights of a neural network are often considered devoid of scrutable internal structure. To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images. Our notion of modularity comes from the graph clustering literature: a "module" is a set of neurons with strong internal connectivity but weak external connectivity. We find that training and weight pruning produces MLPs that are more modular than randomly initialized ones, and often significantly more modular than random MLPs with the same (sparse) distribution of weights. Interestingly, they are much more modular when trained with dropout. We also present exploratory analyses of the importance of different modules for performance and how modules depend on each other. Understanding the modular structure of neural networks, when such structure exists, will hopefully render their inner workings more interpretable to engineers. 1 Note that this paper has been superceded by "Clusterability in Neural Networks", arXiv:2103.03386 and "Quantifying Local Specialization in Deep Neural Networks", arXiv:2110.08058! * Equal contributions, order determined randomly.
arXiv (Cornell University), 2020
The learned weights of a neural network are often considered devoid of scrutable internal structure. To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images. Our notion of modularity comes from the graph clustering literature: a "module" is a set of neurons with strong internal connectivity but weak external connectivity. We find that training and weight pruning produces MLPs that are more modular than randomly initialized ones, and often significantly more modular than random MLPs with the same (sparse) distribution of weights. Interestingly, they are much more modular when trained with dropout. We also present exploratory analyses of the importance of different modules for performance and how modules depend on each other. Understanding the modular structure of neural networks, when such structure exists, will hopefully render their inner workings more interpretable to engineers. 1 Note that this paper has been superceded by "Clusterability in Neural Networks", arXiv:2103.03386 and "Quantifying Local Specialization in Deep Neural Networks", arXiv:2110.08058! * Equal contributions, order determined randomly.
ArXiv, 2021
A neural network is modular to the extent that parts of its computational graph (i.e. structure) can be represented as performing some comprehensible subtask relevant to the overall task (i.e. functionality). Are modern deep neural networks modular? How can this be quantified? In this paper, we consider the problem of assessing the modularity exhibited by a partitioning of a network’s neurons. We propose two proxies for this: importance, which reflects how crucial sets of neurons are to network performance; and coherence, which reflects how consistently their neurons associate with features of the inputs. To measure these proxies, we develop a set of statistical methods based on techniques conventionally used to interpret individual neurons. We apply the proxies to partitionings generated by spectrally clustering a graph representation of the network’s neurons with edges determined either by network weights or correlations of activations. We show that these partitionings, even ones ...
arXiv (Cornell University), 2021
The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.
arXiv (Cornell University), 2021
A neural network is locally specialized to the extent that parts of its computational graph (i.e. structure) can be abstractly represented as performing some comprehensible sub-task relevant to the overall task (i.e. functionality). Are modern deep neural networks locally specialized? How can this be quantified? In this paper, we consider the problem of taking a neural network whose neurons are partitioned into clusters, and quantifying how functionally specialized the clusters are. We propose two proxies for this: importance, which reflects how crucial sets of neurons are to network performance; and coherence, which reflects how consistently their neurons associate with features of the inputs. To measure these proxies, we develop a set of statistical methods based on techniques conventionally used to interpret individual neurons. We apply the proxies to partitionings generated by spectrally clustering a graph representation of the network's neurons with edges determined either by network weights or correlations of activations. We show that these partitionings, even ones based only on weights (i.e. strictly from non-runtime analysis), reveal groups of neurons that are important and coherent. These results suggest that graph-based partitioning can reveal local specialization and that statistical methods can be used to automatedly screen for sets of neurons that can be understood abstractly. Code is available at https://github.com/thestephencasper/local specialization.
ArXiv, 2019
Training a Neural Network (NN) with lots of parameters or intricate architectures creates undesired phenomena that complicate the optimization process. To address this issue we propose a first modular approach to NN design, wherein the NN is decomposed into a control module and several functional modules, implementing primitive operations. We illustrate the modular concept by comparing performances between a monolithic and a modular NN on a list sorting problem and show the benefits in terms of training speed, training stability and maintainability. We also discuss some questions that arise in modular NNs.
Lecture Notes in Computer Science, 2004
Modular neural networks have the possibility of overcoming common scalability and interference problems experienced by fully connected neural networks when applied to large databases. In this paper we trial an approach to constructing modular ANN's for a very large problem from CEDAR for the classification of handwritten characters. In our approach, we apply progressive task decomposition methods based upon clustering and regression techniques to find modules. We then test methods for combining the modules into ensembles and compare their structural characteristics and classification performance with that of an ANN having a fully connected topology. The results reveal improvements to classification rates as well as network topologies for this problem.
ArXiv, 2021
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training. Our analysis provides evidence for this supposition by relating the network’s distribution of Lipschitz constants (i.e., the norm of the gradient at different regions of the input space) during different training intervals with the behavior of the stochastic training procedure. We first observe that the average Lipschitz constant close to the training data affects various aspects of the parameter trajectory, with more complex networks having a longer trajectory, bigger variance, and often veering further from their initialization. We then show that NNs whose biases are trained more steadily have bounded complexity even in regions of the input space that are far from any training point. Finally, we find that steady training with Dropout implies a trainingand data-dependent generalization bound that grows poly-logar...
2021
Tools to analyze the latent space of deep neural networks provide a step towards better understanding them. In this work, we motivate sparse subspace clustering (SSC) with an aim to learn affinity graphs from the latent structure of a given neural network layer trained over a set of inputs. We then use tools from Community Detection to quantify structures present in the input. These experiments reveal that as we go deeper in a network, inputs tend to have an increasing affinity to other inputs of the same class. Subsequently, we utilise matrix similarity measures to perform layer-wise comparisons between affinity graphs. In doing so we first demonstrate that when comparing a given layer currently under training to its final state, the shallower the layer of the network, the quicker it is to converge than the deeper layers. When performing a pairwise analysis of the entire network architecture, we observe that, as the network increases in size, it reorganises from a state where each ...
2002
There exist many ideas and assumptions about the development and meaning of modularity in biological and technical neural systems. We empirically study the evolution of connectionist models in the context of modular problems. For this purpose, we define quantitative measures for the degree of modularity and monitor them during evolutionary processes under different constraints. It turns out that the modularity of the problem is reflected by the architecture of adapted systems, although learning can counterbalance some imperfection of the architecture. The demand for fast learning systems increases the selective pressure towards modularity.
Cornell University - arXiv, 2022
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent. These findings seem counter-intuitive since in general neural networks are highly complex models. Why does a linear structure emerge when the networks become wide? In this work, we provide a new perspective on this "transition to linearity" by considering a neural network as an assembly model recursively built from a set of sub-models corresponding to individual neurons. In this view, we show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.
Abstract The concept of modularity is a main concern for the generation of artificially intelligent systems. Modularity is an ubiquitous organization principle found everywhere in natural and artificial complex systems (Callebaut, 2005). Evidences from biological and philosophical points of view (Caelli and Wen, 1999)(Fodor, 1983), indicate that modularity is a requisite for complex intelligent behaviour. Besides, from an engineering point of view, modularity seems to be the only way for the construction of complex structures.
Lecture Notes in Computer Science, 2020
The current understanding of deep neural networks can only partially explain how input structure, network parameters and optimization algorithms jointly contribute to achieve the strong generalization power that is typically observed in many real-world applications. In order to improve the comprehension and interpretability of deep neural networks, we here introduce a novel theoretical framework based on the compositional structure of piecewise linear activation functions. By defining a direct acyclic graph representing the composition of activation patterns through the network layers, it is possible to characterize the instances of the input data with respect to both the predicted label and the specific (linear) transformation used to perform predictions. Preliminary tests on the MNIST dataset show that our method can group input instances with regard to their similarity in the internal representation of the neural network, providing an intuitive measure of input complexity.
2018
Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end. In contrast to existing approaches, training does not rely on regularization to enforce diversity in module use. We apply modular networks both to image recognition and language modeling tasks, where we achieve superior performance compared to several baselines. Introspection reveals that modules specialize in interpretable contexts.
Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468)
A fast method for sizing the multilayer perceptron is proposed. The principal assumption is that a modular network with the same theoretical pattern storage as the multilayer perceptron has the same training error. This assumption is analyzed for the case of random patterns. Using several benchmark datasets, the validity of the approach is demonstrated.
Proceedings of the IEEE, 1999
This paper considers neural computing models for information processing in terms of collections of subnetwork modules. Two approaches to generating such networks are studied. The first approach includes networks with functionally independent subnetworks, where each subnetwork is designed to have specific functions, communication, and adaptation characteristics. The second approach is based on algorithms that can actually generate network and subnetwork topologies, connections, and weights to satisfy specific constraints. Associated algorithms to attain these goals include evolutionary computation and selforganizing maps. We argue that this modular approach to neural computing is more in line with the neurophysiology of the vertebrate cerebral cortex, particularly with respect to sensation and perception. We also argue that this approach has the potential to aid in solutions to large-scale network computational problems-an identified weakness of simply defined artificial neural networks.
2020
We study the phenomenon that some modules of deep neural networks (DNNs) are more \emph{critical} than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network's performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called {\em module criticality}, based on the shape of the valleys that connects the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so.
Complex Networks IX, 2018
Graph convolution is a recent scalable method for performing deep feature learning on attributed graphs by aggregating local node information over multiple layers. Such layers only consider attribute information of node neighbors in the forward model and do not incorporate knowledge of global network structure in the learning task. In particular, the modularity function provides a convenient source of information about the community structure of networks. In this work we investigate the effect on the quality of learned representations by the incorporation of community structure preservation objectives of networks in the graph convolutional model. We incorporate the objectives in two ways, through an explicit regularization term in the cost function in the output layer and as an additional loss term computed via an auxiliary layer. We report the effect of community structure preserving terms in the graph convolutional architectures. Experimental evaluation on two attributed bibilographic networks showed that the incorporation of the communitypreserving objective improves semi-supervised node classification accuracy in the sparse label regime.
ArXiv, 2021
Understanding the behavior of Artificial Neural Networks is one of the main topics in the field recently, as black-box approaches have become usual since the widespread of deep learning. Such high-dimensional models may manifest instabilities and weird properties that resemble complex systems. Therefore, we propose Complex Network (CN) techniques to analyze the structure and performance of fully connected neural networks. For that, we build a dataset with 4 thousand models and their respective CN properties. They are employed in a supervised classification setup considering four vision benchmarks. Each neural network is approached as a weighted and undirected graph of neurons and synapses, and centrality measures are computed after training. Results show that these measures are highly related to the network classification performance. We also propose the concept of Bag-Of-Neurons (BoN), a CN-based approach for finding topological signatures linking similar neurons. Results suggest t...
2009
There are mainly two approaches for machine learning: symbolic and sub-symbolic. Decision tree is a typical model for symbolic learning, and neural network is a model for sub-symbolic learning. For pattern recognition, decision trees are more efficient than neural networks for two reasons. First, the computations in making decisions are simpler. Second, important features can be selected automatically during the design process. This paper introduces models for modular neural network that are a neural network tree where each node being an expert neural network and modular neural architecture where interconnections between modules are reduced. In this paper, we will study adaptation processes of neural network trees, modular neural network and conventional neural network. Then, we will compare all these adaptation processes during experimental work with the Fisher's Iris data set that is the bench test database from the area of machine learning. Experimental results with a recognition problem show that both models (e.g. neural network tree and modular neural network) have better adaptation results than conventional multilayer neural network architecture but the time complexity for trained neural network trees increases exponentially with the number of inputs.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.