Papers by Simone Scardapane
Linear-in-the-parameters nonlinear adaptive filters often show some sparse behavior due to the fa... more Linear-in-the-parameters nonlinear adaptive filters often show some sparse behavior due to the fact that not all the coefficients are equally useful for the modeling of any nonlin-earity. Recently, proportionate algorithms have been proposed to leverage sparsity behaviors in nonlinear filtering. In this paper, we deal with this problem by introducing a proportionate adap-tive algorithm based on an ℓ1-norm penalty of the cost function, which regularizes the solution, to be used for a class of nonlinear filters based on functional links. The proposed algorithm stresses the difference between useful and useless functional links for the purpose of nonlinear modeling. Experimental results clearly show faster convergence performance with respect to the standard (i.e., non-regularized) version of the algorithm.
Automatic transcription of historical handwritten documents is a challenging research problem, re... more Automatic transcription of historical handwritten documents is a challenging research problem, requiring in general expensive transcriptions from expert paleographers. In Codice Ratio is designed to be an end-to-end architecture requiring instead limited labeling effort, whose aim is the automatic transcription of a portion of the Vatican Secret Archives (one of the largest historical libraries in the world). In this paper, we describe in particular the design of our OCR component for Latin characters. To this end, we first annotated a large corpus of Latin characters with a custom crowdsourcing platform. Leveraging over recent progresses in deep learning, we designed and trained a deep con-volutional network achieving an overall accuracy of 96% over the entire dataset, which is one of the highest results reported in the literature so far. Our training data are publicly available.

Biometric security systems based on predefined speech sentences are extremely common nowadays, pa... more Biometric security systems based on predefined speech sentences are extremely common nowadays, particularly in low-cost applications where the simplicity of the hardware involved is a great advantage. Audio spoofing verification is the problem of detecting whether a speech segment acquired from such a system is genuine, or whether it was synthesized or modified by a computer in order to make it sound like an authorized person. Developing countermeasures for spoofing attacks is clearly essential for having effective biometric and security systems based on audio features, all the more significant due to recent advances in generative machine learning. Nonetheless, the problem is complicated by the possible lack of knowledge on the technique(s) used to put forward the attack, so that anti-spoofing systems should be able to withstand also spoofing attacks that were not considered explicitly in the training stage. In this paper, we analyze the use of deep recurrent networks applied to this task, i.e. networks made by the successive combination of multiple feedforward and recurrent layers. These networks are routinely used in speech recognition and language identification but, to the best of our knowledge, they were never considered for this specific problem. We evaluate several architectures on the dataset released for the ASVspoof 2015 challenge last year. We show that, by working with very standard feature extraction routines and with a minimum amount of fine-tuning, the networks can already reach very promising error rates, comparable to state-of-the-art approaches, paving the way to further investigations on the problem using deep RNN models.

The interplay between randomness and optimization has always been a major theme in the design of ... more The interplay between randomness and optimization has always been a major theme in the design of neural networks [3]. In the last 15 years, the success of reservoir computing (RC) showed that, in many scenarios, the algebraic structure of the recurrent component is far more important than the precise fine-tuning of its weights. As long as the recurrent part of the network possesses a form of fading memory of the input, the dynamics of the neurons are enough to efficiently process many spatio-temporal signals, provided that their activations are sufficiently heterogeneous. Even if today it is feasible to fully optimize deep recurrent networks , their implementation still requires a vast degree of experience and practice, not to mention vast computational resources, limiting their applicability in simpler architec-tures (e.g., embedded systems) or in areas where time is of key importance (e.g., online systems). Not surprisingly, then, RC remains a powerful tool for quickly solving dynamical problems, and it has become an invaluable tool for modeling and analysis in neuroscience. Ten years after the last special issue entirely dedicated to the topic [2], this issue aims at providing an up-to-date overview on (some of) the latest developments in the field. Recently, Goudarzi and Teuscher listed a series of 11 questions that will drive research in RC from here forward [1]. Although we cannot cover all of them in a single issue, many of these questions are addressed in the articles that compose the issue, which we believe provides a good overview on the diversity and the vitality of the field. Overall, we hope the issue to be of interest to the readers of Cognitive Computation. In particular, we selected ten papers to appear in this special issue. All of them have gone through at least two rounds of revision by two to four expert reviewers. One paper, coauthored by one of the guest editors, underwent an independent review process to guarantee fairness. The articles are logically organized in three separate parts. The first third of the issue is dedicated to the study of delay-line architectures, which have recently been inspired by the possibility of implementation on non-conventional computing architectures, most notably photonic computers. The second part of the issue investigates some theoretical aspects of RC models, and the third part is devoted to innovative formulations for designing architectures for learning and recognition tasks. The first four papers of the special issue are dedicated to photonic RC and time-delay architectures: – In 'Online training for high-performance analogue readout layers in photonic reservoir computers', Antonik et al. propose the use of online training algorithms when exploiting analogue readouts in photonic RC. Their simulated experiments show that online algorithms can

The aim of this paper is to develop a general framework for training neural networks (NNs) in a d... more The aim of this paper is to develop a general framework for training neural networks (NNs) in a distributed environment, where
training data is partitioned over a set of agents that communicate with each other through a sparse, possibly time-varying, connectivity pattern. In such distributed scenario, the training problem can be formulated as the (regularized) optimization of a non-convex social cost function, given by the sum of local (non-convex) costs, where each agent contributes with a single error term defined with respect to its local dataset. To devise a flexible and efficient solution, we customize a recently proposed framework for non-convex optimization over networks, which hinges on a (primal) convexification-decomposition technique to handle non-convexity, and a dynamic consensus procedure to diffuse information among the agents. Several typical choices for the training criterion (e.g., squared loss, cross entropy, etc.) and regularization (e.g., l2 norm, sparsity inducing penalties, etc.) are included in the framework and explored along the paper. Convergence to a stationary solution of the social non-convex problem is guaranteed under mild assumptions. Additionally, we show a principled way allowing each agent to exploit a possible multi-core architecture (e.g., a local cloud) in order to parallelize its local optimization step, resulting in strategies that are both distributed (across the agents) and parallel (inside each agent) in nature. A comprehensive set of experimental results validate the proposed approach.

Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data... more Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data to build feature-based classifiers and nonlinear predictive models. Training neu-ral networks involves the optimization of non-convex objective functions, and usually the learning process is costly and infeasible for applications associated with data streams. A possible, albeit counter-intuitive alternative is to randomly assign a subset of the networks' weights, so that the resulting optimization task can be formulated as a linear least-squares problem. This methodology can be applied to both feedforward and recurrent networks, and similar techniques can be used to approximate kernel functions. Many experimental results indicate that such randomized models can reach sound performance compared to fully adaptable ones, with a number of favourable benefits, including (i) simplicity of implementation , (ii) faster learning with less intervention from human beings, and (iii) possibility of leveraging over all linear regression and classification algorithms (e.g., l1 norm minimization for obtaining sparse formulations). All these points make them attractive and valuable to the data mining community, particularly for handling large scale data mining in real-time. However, the literature in the field is extremely vast and fragmented, with many results being reintroduced multiple times under different names. This overview aims at providing a self-contained, uniform introduction to the different ways in which randomization can be applied to the design of neural networks and kernel functions. A clear exposition of the basic framework underlying all these approaches helps to clarify innovative lines of research, open problems and, most importantly, foster the exchanges of well-known results throughout different communities.

Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a p... more Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a powerful tool for the analysis of dynamic data. In an ESN, the input signal is fed to a fixed (possibly large) pool of interconnected neurons, whose state is then read by an adaptable layer to provide the output. This last layer is generally trained via a regularized linear least-squares procedure. In this paper, we consider the more complex problem of training an ESN for classification problems in a semi-supervised setting, wherein only a part of the input sequences are effectively labeled with the desired response. To solve the problem , we combine the standard ESN with a semi-supervised support vector machine (S 3 VM) for training its adaptable connections. Additionally, we propose a novel algorithm for solving the resulting non-convex optimization problem, hinging on a series of successive approximations of the original problem. The resulting procedure is highly cus-tomizable and also admits a principled way of parallelizing training over multiple processors/computers. An extensive set of experimental evaluations on audio classification tasks supports the presented semi-supervised ESN as a practical tool for dynamic problems requiring the analysis of partially labeled data.
The aim of this paper is to develop a theoretical framework for training neural network (NN) mode... more The aim of this paper is to develop a theoretical framework for training neural network (NN) models, when data is distributed over a set of agents that are connected to each other through a sparse network topology. The framework builds on a distributed convexification technique, while leverag-ing dynamic consensus to propagate the information over the network. It can be customized to work with different loss and regularization functions, typically used when training NN models, while guaranteeing provable convergence to a stationary solution under mild assumptions. Interestingly , it naturally leads to distributed architectures where agents solve local optimization problems exploiting parallel multi-core processors. Numerical results corroborate our theoretical findings, and assess the performance for parallel and distributed training of neural networks.

In this paper, we consider the problem of distributed spectral clustering, wherein the data to be... more In this paper, we consider the problem of distributed spectral clustering, wherein the data to be clustered is (horizontally) partitioned over a set of interconnected agents with limited connectivity. In order to solve it, we consider the equivalent problem of reconstructing the Euclidean distance matrix of pairwise distances among the joint set of datapoints. This is obtained in a fully decentralized fashion, making use of an innovative distributed gradient-based procedure, where at every agent we interleave gradient steps on a low-rank factorization of the distance matrix, with local averaging steps considering all its neighbors' current estimates. The procedure can be applied to any spectral clustering algorithm, including normalized and unnormalized variations, for multiple choices of the underlying Laplacian matrix. Experimental evaluations demonstrate that the solution is competitive with a fully centralized solver, where data is collected beforehand on a (virtual) coordinating agent.

Diffusion adaptation (DA) algorithms allow a network of agents to collectively estimate a paramet... more Diffusion adaptation (DA) algorithms allow a network of agents to collectively estimate a parameter vector, by jointly minimizing the sum of their local cost functions. This is achieved by interleaving local update steps with 'diffusion' steps, where information is combined with their own neighbors. In this paper, we propose a novel class of nonlinear diffusion filters, based on the recently proposed spline adaptive filter (SAF). A SAF learns nonlinear models by local interpolating polynomials, with a small overhead with respect to linear filters. This arises from the fact that only a small subset of parameters of the nonlinear component are adapted at every time-instant. By applying ideas from the DA framework, in this paper we derive a diffused version of the SAF, denoted as D-SAF. Experimental evaluations show that the D-SAF is able to robustly learn the underlying nonlinear model, with a significant gain compared to a non-cooperative solution.

Distributed learning refers to the problem of inferring a function when the training data are dis... more Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.

In a network of agents, a widespread problem is the need to estimate a common underlying function... more In a network of agents, a widespread problem is the need to estimate a common underlying function starting from locally distributed measurements. Real-world scenarios may not allow the presence of centralized fusion centers, requiring the development of distributed, message-passing implementations of the standard machine learning training algorithms. In this paper, we are concerned with the distributed training of a particular class of recurrent neural networks, namely echo state networks (ESNs). In the centralized case, ESNs have received considerable attention, due to the fact that they can be trained with standard linear regression routines. Based on this observation, in our previous work we have introduced a decentralized algorithm, framed in the distributed optimization field, in order to train an ESN. In this paper, we focus on an additional sparsity property of the output layer of ESNs, allowing for very efficient implementations of the resulting networks. In order to evaluate the proposed algorithm, we test it on two well-known prediction benchmarks, namely the Mackey-Glass chaotic time series and the 10th order nonlinear auto regressive moving average (NARMA) system.

The semi-supervised support vector machine (S3VM) is a well-known algorithm for performing semi-s... more The semi-supervised support vector machine (S3VM) is a well-known algorithm for performing semi-supervised inference under the large margin principle. In this paper, we are interested in the problem of training a S3VM when the labeled and unlabeled samples are distributed over a network of interconnected agents. In particular, the aim is to design a distributed training protocol over networks, where communication is restricted only to neighboring agents and no coordinating authority is present. Using a standard relaxation of the original S3VM, we formulate the training problem as the distributed minimization of a non-convex social cost function. To find a (stationary) solution in a distributed manner, we employ two different strategies: i) a distributed gradient descent algorithm; ii) a recently developed framework for
In-Network Nonconvex Optimization (NEXT), which is based on successive convexifications of the original problem, interleaved by state diffusion steps. Our experimental results show that the proposed distributed algorithms have comparable performance with respect to a centralized implementation, while highlighting the pros and cons of the proposed solutions. To the date, this is the first work that paves the way toward the broad field of distributed semi-supervised learning over networks.

We propose a system able to synthesize automatically a classification model and a set of interpre... more We propose a system able to synthesize automatically a classification model and a set of interpretable decision rules defined over a set of symbols, corresponding to frequent substructures of the input dataset. Given a preprocessing procedure which maps every input element into a fully labeled graph, the system solves the classification problem in the graph domain. The extracted rules are then able to characterize semantically the classes of the problem at hand. The structured data that we consider in this paper are images coming from classification datasets: they represent an effective proving ground for studying the ability of the system to extract interpretable classification rules. For this particular input domain, the preprocessing procedure is based on a flexible segmentation algorithm whose behavior is defined by a set of parameters. The core inference engine uses a parametric graph edit dissimilarity measure. A genetic algorithm is in charge of selecting suitable values for the parameters, in order to synthesize a classification model based on interpretable rules which maximize the generalization capability of the model. Decision rules are defined over a set of information granules in the graph domain, identified by a frequent substructures miner. We compare the system with two other state-of-the-art graph classifiers, evidencing both its main strengths and limits.

In this paper, we investigate the problem of music classification when training data is distribut... more In this paper, we investigate the problem of music classification when training data is distributed throughout a network of interconnected agents (e.g. computers, or mobile devices), and it is available in a sequential stream. Under the considered setting, the task is for all the nodes, after receiving any new chunk of training data, to agree on a single classifier in a decentralized fashion, without reliance on a master node. In particular, in this paper we propose a fully decentralized, sequential learning algorithm for a class of neural networks known as Random Vector Functional-Link nets. The proposed algorithm does not require the presence of a single coordinating agent, and it is formulated exclusively in term of local exchanges between neighboring nodes, thus making it useful in a wide range of realistic situations. Experimental simulations on four music classification benchmarks show that the algorithm has comparable performance with respect to a centralized solution, where a single agent collects all the local data from every node and subsequently updates the model.
Nonlinear distortions pose a serious problem for the quality preservation of audio and speech sig... more Nonlinear distortions pose a serious problem for the quality preservation of audio and speech signals. To address this problem, such signals are processed by nonlinear models. Functional link adaptive filter (FLAF) is a Iinear-in-the-parameter
nonlinear model, whose nonlinear transformation of the input is characterized by a basis function expansion, satisfying the universal approximation properties. Since the expansion type affects the nonlinear modeling according to the nature of the input signal, in this paper we investigate the FLAF modeling performance involving the most popular functional expansions when audio and speech signals are processed. A comprehensive analysis is conducted to provide the best suitable solution for the processing of nonlinear signals. Experimental results are assessed also in terms of signal quality and intelligibility.

Neural Networks
We approach the problem of forecasting the load of incoming calls in a cell of a mobile network u... more We approach the problem of forecasting the load of incoming calls in a cell of a mobile network using Echo State Networks. With respect to previous approaches to the problem, we consider the inclusion of additional telephone records regarding the activity registered in the cell as exogenous variables, by investigating their usefulness in the forecasting task. Additionally, we analyze different methodologies for training the readout of the network, including two novel variants, namely -SVR and an elastic net penalty. Finally, we employ a genetic algorithm for both the tasks of tuning the parameters of the system and for selecting the optimal subset of most informative additional timeseries to be considered as external inputs in the forecasting problem. We compare the performances with standard prediction models and we valuate
the results according to the specific properties of the considered time-series.

The current big data deluge requires innovative solutions for performing efficient inference on l... more The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feedforward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges
between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully
centralized implementation, in terms of speed, efficiency and generalization accuracy.

Semi-supervised learning (SSL) is the problem of learning a function with only a partially labele... more Semi-supervised learning (SSL) is the problem of learning a function with only a partially labeled training set. It has considerable practical interest in applications where labeled data is costly to obtain, while unlabeled data is abundant. One approach to SSL in the case of binary classification is inspired by work on transductive learning (TL) by V. Vapnik. It has been applied prevalently using support vector machines (SVM) as the base learning algorithm, giving rise to the so-called transductive SVM (TR-SVM). The resulting optimization problem, however, is highly non-convex and complex to solve. In this paper, we propose an alternative semi-supervised training algorithm based on the TL theory, namely semi-supervised random vector functional-link (RVFL) network, which is able to obtain state-of-the-art performance, while resulting in a standard convex optimization problem. In particular we show that, thanks to the characteristics of RVFLs networks, the resulting optimization problem can be safely approximated with a standard quadratic programming problem solvable in polynomial time. A wide range of experiments validate our proposal. As a comparison, we also propose a semi-supervised algorithm for RVFLs based on the theory of manifold regularization.

The aim of this paper is to describe a novel security system able to localize and classify audio ... more The aim of this paper is to describe a novel security system able to localize and classify audio sources inan outdoor environment. Its primary intended use is for security monitoring in severe scenarios, and ithas been designed to cope with a large set of heterogeneous objects, including weapons, human speakersand vehicles. The system is the result of a research project sponsored by the Italian Ministry of Defense.It is composed of a large squared array of 864 microphones arranged in a rectangular lattice, whose inputis processed using a classical delay-and-sum beamformer. The result of this localization process is elab-orated by a complex multi-level classification system designed in a modular fashion. In this paper, afterpresenting the details of the system’s design, with a particular emphasis on the innovative aspects thatare introduced with respect to the state-of-the-art, we provide an extensive set of simulations show-ing the effectiveness of the proposed architecture. We conclude by describing the current limits of thesystem, and the projected further developments.
Uploads
Papers by Simone Scardapane
training data is partitioned over a set of agents that communicate with each other through a sparse, possibly time-varying, connectivity pattern. In such distributed scenario, the training problem can be formulated as the (regularized) optimization of a non-convex social cost function, given by the sum of local (non-convex) costs, where each agent contributes with a single error term defined with respect to its local dataset. To devise a flexible and efficient solution, we customize a recently proposed framework for non-convex optimization over networks, which hinges on a (primal) convexification-decomposition technique to handle non-convexity, and a dynamic consensus procedure to diffuse information among the agents. Several typical choices for the training criterion (e.g., squared loss, cross entropy, etc.) and regularization (e.g., l2 norm, sparsity inducing penalties, etc.) are included in the framework and explored along the paper. Convergence to a stationary solution of the social non-convex problem is guaranteed under mild assumptions. Additionally, we show a principled way allowing each agent to exploit a possible multi-core architecture (e.g., a local cloud) in order to parallelize its local optimization step, resulting in strategies that are both distributed (across the agents) and parallel (inside each agent) in nature. A comprehensive set of experimental results validate the proposed approach.
In-Network Nonconvex Optimization (NEXT), which is based on successive convexifications of the original problem, interleaved by state diffusion steps. Our experimental results show that the proposed distributed algorithms have comparable performance with respect to a centralized implementation, while highlighting the pros and cons of the proposed solutions. To the date, this is the first work that paves the way toward the broad field of distributed semi-supervised learning over networks.
nonlinear model, whose nonlinear transformation of the input is characterized by a basis function expansion, satisfying the universal approximation properties. Since the expansion type affects the nonlinear modeling according to the nature of the input signal, in this paper we investigate the FLAF modeling performance involving the most popular functional expansions when audio and speech signals are processed. A comprehensive analysis is conducted to provide the best suitable solution for the processing of nonlinear signals. Experimental results are assessed also in terms of signal quality and intelligibility.
the results according to the specific properties of the considered time-series.
between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully
centralized implementation, in terms of speed, efficiency and generalization accuracy.
training data is partitioned over a set of agents that communicate with each other through a sparse, possibly time-varying, connectivity pattern. In such distributed scenario, the training problem can be formulated as the (regularized) optimization of a non-convex social cost function, given by the sum of local (non-convex) costs, where each agent contributes with a single error term defined with respect to its local dataset. To devise a flexible and efficient solution, we customize a recently proposed framework for non-convex optimization over networks, which hinges on a (primal) convexification-decomposition technique to handle non-convexity, and a dynamic consensus procedure to diffuse information among the agents. Several typical choices for the training criterion (e.g., squared loss, cross entropy, etc.) and regularization (e.g., l2 norm, sparsity inducing penalties, etc.) are included in the framework and explored along the paper. Convergence to a stationary solution of the social non-convex problem is guaranteed under mild assumptions. Additionally, we show a principled way allowing each agent to exploit a possible multi-core architecture (e.g., a local cloud) in order to parallelize its local optimization step, resulting in strategies that are both distributed (across the agents) and parallel (inside each agent) in nature. A comprehensive set of experimental results validate the proposed approach.
In-Network Nonconvex Optimization (NEXT), which is based on successive convexifications of the original problem, interleaved by state diffusion steps. Our experimental results show that the proposed distributed algorithms have comparable performance with respect to a centralized implementation, while highlighting the pros and cons of the proposed solutions. To the date, this is the first work that paves the way toward the broad field of distributed semi-supervised learning over networks.
nonlinear model, whose nonlinear transformation of the input is characterized by a basis function expansion, satisfying the universal approximation properties. Since the expansion type affects the nonlinear modeling according to the nature of the input signal, in this paper we investigate the FLAF modeling performance involving the most popular functional expansions when audio and speech signals are processed. A comprehensive analysis is conducted to provide the best suitable solution for the processing of nonlinear signals. Experimental results are assessed also in terms of signal quality and intelligibility.
the results according to the specific properties of the considered time-series.
between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully
centralized implementation, in terms of speed, efficiency and generalization accuracy.