Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
The Annals of Applied Probability
…
64 pages
1 file
This article studies the Gram random matrix model G = 1 T Σ T Σ, Σ = σ(W X), classically found in the analysis of random feature maps and random neural networks, where X = [x1, . . . , xT ] ∈ R p×T is a (data) matrix of bounded norm, W ∈ R n×p is a matrix of independent zero-mean unit variance entries, and σ : R → R is a Lipschitz continuous (activation) function -σ(W X) being understood entrywise. By means of a key concentration of measure lemma arising from non-asymptotic random matrix arguments, we prove that, as n, p, T grow large at the same rate, the resolvent Q = (G + γIT ) -1 , for γ > 0, has a similar behavior as that met in sample covariance matrix models, involving notably the moment Φ = T n E[G], which provides in passing a deterministic equivalent for the empirical spectral measure of G. Application-wise, this result enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters. * Couillet's work is supported by the ANR Project RMT4GRAPH (ANR-14-CE28-0006).
2018
This article provides a theoretical analysis of the asymptotic performance of a regression or classification task performed by a simple random neural network. This result is obtained by leveraging a new framework at the crossroads between random matrix theory and the concentration of measure theory. This approach is of utmost interest for neural network analysis at large in that it naturally dismisses the difficulty induced by the non-linear activation functions, so long that these are Lipschitz functions. As an application, we provide formulas for the limiting law of the random neural network output and compare them conclusively to those obtained practically on handwritten digits databases.
Random Matrices: Theory and Applications
We study the eigenvalue distribution of random matrices pertinent to the analysis of deep neural networks. The matrices resemble the product of the sample covariance matrices, however, an important difference is that the analog of the population covariance matrix is now a function of random data matrices (synaptic weight matrices in the deep neural network terminology). The problem has been treated in recent work [J. Pennington, S. Schoenholz and S. Ganguli, The emergence of spectral universality in deep networks, Proc. Mach. Learn. Res. 84 (2018) 1924–1932, arXiv:1802.09979] by using the techniques of free probability theory. Since, however, free probability theory deals with population covariance matrices which are independent of the data matrices, its applicability in this case has to be justified. The justification has been given in [L. Pastur, On random matrices arising in deep neural networks: Gaussian case, Pure Appl. Funct. Anal. (2020), in press, arXiv:2001.06188] for Gauss...
arXiv: Mathematical Physics, 2020
The paper deals with distribution of singular values of product of random matrices arising in the analysis of deep neural networks. The matrices resemble the product analogs of the sample covariance matrices, however, an important difference is that the population covariance matrices, which are assumed to be non-random in the standard setting of statistics and random matrix theory, are now random, moreover, are certain functions of random data matrices. The problem has been considered in recent work [21] by using the techniques of free probability theory. Since, however, free probability theory deals with population matrices which are independent of the data matrices, its applicability in this case requires an additional justification. We present this justification by using a version of the standard techniques of random matrix theory under the assumption that the entries of data matrices are independent Gaussian random variables. In the subsequent paper [18] we extend our results to...
Physica A: Statistical Mechanics and its Applications, 2022
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a role for it in the study of loss surfaces in deep learning.
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
This article proposes an original approach to the performance understanding of large dimensional neural networks. In this preliminary study, we study a single hidden layer feed-forward network with random input connections (also called extreme learning machine) which performs a simple regression task. By means of a new random matrix result, we prove that, as the size and cardinality of the input data and the number of neurons grow large, the network performance is asymptotically deterministic. This entails a better comprehension of the effects of the hyper-parameters (activation function, number of neurons, etc.) under this simple setting, thereby paving the path to the harnessing of more involved structures.
arXiv: Probability, 2019
The present work provides an original framework for random matrix analysis based on revisiting the concentration of measure theory for random vectors. By providing various notions of vector concentration (q-exponential, linear, Lipschitz, convex), a set of elementary tools is laid out that allows for the immediate extension of classical results from random matrix theory involving random concentrated vectors in place of vectors with independent entries. These findings are exemplified here in the context of sample covariance matrices but find a large range of applications in statistical learning and beyond, starting with the capacity to easily analyze the performance of artificial neural networks and random feature maps.
2016
We study the asymptotic law of a network of interacting neurons when the number of neurons becomes infinite. Given a completely connected network of neurons in which the synaptic weights are Gaussian correlated random variables, we describe the asymptotic law of the network when the number of neurons goes to infinity. We introduce the process-level empirical measure of the trajectories of the solutions to the equations of the finite network of neurons and the averaged law (with respect to the synaptic weights) of the trajectories of the solutions to the equations of the network of neurons. The main result of this article is that the image law through the empirical measure satisfies a large deviation principle with a good rate function which is shown to have a unique global minimum. Our analysis of the rate function allows us also to characterize the limit measure as the image of a stationary Gaussian measure defined on a transformed set of trajectories.
Entropy, 2015
We study the asymptotic law of a network of interacting neurons when the number of neurons becomes infinite. Given a completely connected network of neurons in which the synaptic weights are Gaussian correlated random variables, we describe the asymptotic law of the network when the number of neurons goes to infinity. All previous works assumed that the weights were i.i.d. random variables, thereby making the analysis much simpler. This hypothesis is not realistic from the biological viewpoint. In order to cope with this extra complexity we introduce the processlevel empirical measure of the trajectories of the solutions to the equations of the finite network of neurons and the averaged law (with respect to the synaptic weights) of the trajectories of the solutions to the equations of the network of neurons. The main result of this article is that the image law through the empirical measure satisfies a large deviation principle with a good rate function which is shown to have a unique global minimum. Finally, our analysis of the rate function allows us also to describe this minimum as a stationary Gaussian measure which completely characterizes the activity of the infinite size network.
Comptes Rendus Mathematique, 2014
arXiv (Cornell University), 2023
In this paper we provide explicit upper bounds on some distances between the (law of the) output of a random Gaussian neural network and (the law of) a random Gaussian vector. Our main results concern deep random Gaussian neural networks, with a rather general activation function. The upper bounds show how the widths of the layers, the activation function and other architecture parameters affect the Gaussian approximation of the output. Our techniques, relying on Stein's method and integration by parts formulas for the Gaussian law, yield estimates on distances which are indeed integral probability metrics, and include the convex distance. This latter metric is defined by testing against indicator functions of measurable convex sets, and so allow for accurate estimates of the probability that the output is localized in some region of the space. Such estimates have a significant interest both from a practitioner's and a theorist's perspective.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
arXiv: Probability, 2019
SIAM Journal on Applied Mathematics, 1977
Random Matrices: Theory and Applications, 2012
Journal of Multivariate Analysis, 1995
Statistics & Probability Letters, 2008
Statistics & Probability Letters, 2011
Journal of Mathematical Physics, 1996
Springer eBooks, 2010
2006 IEEE Ninth International Symposium on Spread Spectrum Techniques and Applications, 2006
Journal of Multivariate Analysis, 2007
Probability Theory and Related Fields, 2002
arXiv (Cornell University), 2013
Neural Computation
Journal of Physics A: Mathematical and General, 1999
arXiv (Cornell University), 2023
Physica A-statistical Mechanics and Its Applications, 2008
Cornell University - arXiv, 2021