Skip to main content

Simone Scardapane

Università degli Studi "La Sapienza" di Roma, Department of Information Engineering, Electronics and Telecommunications, Graduate Student

Followers

134

Following

53

Co-authors

12

Public Views

I am a post-doc fellow at Sapienza University (Rome) and an honorary research fellow at the CogBID laboratory of University of Stirling. My research is focused on neural networks, distributed learning, semi-supervised learning, kernel methods and audio classification. I am also available as a freelance consultant, and I like to present machine learning concepts in divulgative events.

Before my PhD, I obtained a Bachelor degree in Computer Engineering in 2009, and a Master degree in Artificial Intelligence and Robotics in 2011, during which I visited Tohoku University (Japan) for a summer school. In 2011, I worked for a year as a software developer.

less

Università degli Studi "La Sapienza" di Roma

Università degli Studi "La Sapienza" di Roma

Michele Scarpiniti

Università degli Studi "La Sapienza" di Roma

Rajib Lochan Das

Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)

University of Michigan

Jerome Louveaux

Mugdha Dewasthale

Interests

Uploads

Papers by Simone Scardapane

Sparse Functional Link Adaptive Filter Using an ℓ 1 -Norm Regularization

by Simone Scardapane and Aurelio Uncini

Linear-in-the-parameters nonlinear adaptive filters often show some sparse behavior due to the fa... more Linear-in-the-parameters nonlinear adaptive filters often show some sparse behavior due to the fact that not all the coefficients are equally useful for the modeling of any nonlin-earity. Recently, proportionate algorithms have been proposed to leverage sparsity behaviors in nonlinear filtering. In this paper, we deal with this problem by introducing a proportionate adap-tive algorithm based on an ℓ1-norm penalty of the cost function, which regularizes the solution, to be used for a class of nonlinear filters based on functional links. The proposed algorithm stresses the difference between useful and useless functional links for the purpose of nonlinear modeling. Experimental results clearly show faster convergence performance with respect to the standard (i.e., non-regularized) version of the algorithm.

In Codice Ratio: OCR of Handwritten Latin Documents using Deep Convolutional Networks

Automatic transcription of historical handwritten documents is a challenging research problem, re... more Automatic transcription of historical handwritten documents is a challenging research problem, requiring in general expensive transcriptions from expert paleographers. In Codice Ratio is designed to be an end-to-end architecture requiring instead limited labeling effort, whose aim is the automatic transcription of a portion of the Vatican Secret Archives (one of the largest historical libraries in the world). In this paper, we describe in particular the design of our OCR component for Latin characters. To this end, we first annotated a large corpus of Latin characters with a custom crowdsourcing platform. Leveraging over recent progresses in deep learning, we designed and trained a deep con-volutional network achieving an overall accuracy of 96% over the entire dataset, which is one of the highest results reported in the literature so far. Our training data are publicly available.

On the Use of Deep Recurrent Neural Networks for Detecting Audio Spoofing Attacks

by Simone Scardapane and Aurelio Uncini

Biometric security systems based on predefined speech sentences are extremely common nowadays, pa... more Biometric security systems based on predefined speech sentences are extremely common nowadays, particularly in low-cost applications where the simplicity of the hardware involved is a great advantage. Audio spoofing verification is the problem of detecting whether a speech segment acquired from such a system is genuine, or whether it was synthesized or modified by a computer in order to make it sound like an authorized person. Developing countermeasures for spoofing attacks is clearly essential for having effective biometric and security systems based on audio features, all the more significant due to recent advances in generative machine learning. Nonetheless, the problem is complicated by the possible lack of knowledge on the technique(s) used to put forward the attack, so that anti-spoofing systems should be able to withstand also spoofing attacks that were not considered explicitly in the training stage. In this paper, we analyze the use of deep recurrent networks applied to this task, i.e. networks made by the successive combination of multiple feedforward and recurrent layers. These networks are routinely used in speech recognition and language identification but, to the best of our knowledge, they were never considered for this specific problem. We evaluate several architectures on the dataset released for the ASVspoof 2015 challenge last year. We show that, by working with very standard feature extraction routines and with a minimum amount of fine-tuning, the networks can already reach very promising error rates, comparable to state-of-the-art approaches, paving the way to further investigations on the problem using deep RNN models.

Advances in Biologically Inspired Reservoir Computing

by Simone Scardapane, J. Butcher, and Filippo Maria Bianchi

The interplay between randomness and optimization has always been a major theme in the design of ... more The interplay between randomness and optimization has always been a major theme in the design of neural networks [3]. In the last 15 years, the success of reservoir computing (RC) showed that, in many scenarios, the algebraic structure of the recurrent component is far more important than the precise fine-tuning of its weights. As long as the recurrent part of the network possesses a form of fading memory of the input, the dynamics of the neurons are enough to efficiently process many spatio-temporal signals, provided that their activations are sufficiently heterogeneous. Even if today it is feasible to fully optimize deep recurrent networks , their implementation still requires a vast degree of experience and practice, not to mention vast computational resources, limiting their applicability in simpler architec-tures (e.g., embedded systems) or in areas where time is of key importance (e.g., online systems). Not surprisingly, then, RC remains a powerful tool for quickly solving dynamical problems, and it has become an invaluable tool for modeling and analysis in neuroscience. Ten years after the last special issue entirely dedicated to the topic [2], this issue aims at providing an up-to-date overview on (some of) the latest developments in the field. Recently, Goudarzi and Teuscher listed a series of 11 questions that will drive research in RC from here forward [1]. Although we cannot cover all of them in a single issue, many of these questions are addressed in the articles that compose the issue, which we believe provides a good overview on the diversity and the vitality of the field. Overall, we hope the issue to be of interest to the readers of Cognitive Computation. In particular, we selected ten papers to appear in this special issue. All of them have gone through at least two rounds of revision by two to four expert reviewers. One paper, coauthored by one of the guest editors, underwent an independent review process to guarantee fairness. The articles are logically organized in three separate parts. The first third of the issue is dedicated to the study of delay-line architectures, which have recently been inspired by the possibility of implementation on non-conventional computing architectures, most notably photonic computers. The second part of the issue investigates some theoretical aspects of RC models, and the third part is devoted to innovative formulations for designing architectures for learning and recognition tasks. The first four papers of the special issue are dedicated to photonic RC and time-delay architectures: – In 'Online training for high-performance analogue readout layers in photonic reservoir computers', Antonik et al. propose the use of online training algorithms when exploiting analogue readouts in photonic RC. Their simulated experiments show that online algorithms can

A Framework for parallel and distributed training of neural networks

by Simone Scardapane and Paolo Lorenzo

The aim of this paper is to develop a general framework for training neural networks (NNs) in a d... more The aim of this paper is to develop a general framework for training neural networks (NNs) in a distributed environment, where
training data is partitioned over a set of agents that communicate with each other through a sparse, possibly time-varying, connectivity pattern. In such distributed scenario, the training problem can be formulated as the (regularized) optimization of a non-convex social cost function, given by the sum of local (non-convex) costs, where each agent contributes with a single error term defined with respect to its local dataset. To devise a flexible and efficient solution, we customize a recently proposed framework for non-convex optimization over networks, which hinges on a (primal) convexification-decomposition technique to handle non-convexity, and a dynamic consensus procedure to diffuse information among the agents. Several typical choices for the training criterion (e.g., squared loss, cross entropy, etc.) and regularization (e.g., l2 norm, sparsity inducing penalties, etc.) are included in the framework and explored along the paper. Convergence to a stationary solution of the social non-convex problem is guaranteed under mild assumptions. Additionally, we show a principled way allowing each agent to exploit a possible multi-core architecture (e.g., a local cloud) in order to parallelize its local optimization step, resulting in strategies that are both distributed (across the agents) and parallel (inside each agent) in nature. A comprehensive set of experimental results validate the proposed approach.

Randomness in Neural Networks: An Overview

by Simone Scardapane and Dianhui Wang

Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data... more Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data to build feature-based classifiers and nonlinear predictive models. Training neu-ral networks involves the optimization of non-convex objective functions, and usually the learning process is costly and infeasible for applications associated with data streams. A possible, albeit counter-intuitive alternative is to randomly assign a subset of the networks' weights, so that the resulting optimization task can be formulated as a linear least-squares problem. This methodology can be applied to both feedforward and recurrent networks, and similar techniques can be used to approximate kernel functions. Many experimental results indicate that such randomized models can reach sound performance compared to fully adaptable ones, with a number of favourable benefits, including (i) simplicity of implementation , (ii) faster learning with less intervention from human beings, and (iii) possibility of leveraging over all linear regression and classification algorithms (e.g., l1 norm minimization for obtaining sparse formulations). All these points make them attractive and valuable to the data mining community, particularly for handling large scale data mining in real-time. However, the literature in the field is extremely vast and fragmented, with many results being reintroduced multiple times under different names. This overview aims at providing a self-contained, uniform introduction to the different ways in which randomization can be applied to the design of neural networks and kernel functions. A clear exposition of the basic framework underlying all these approaches helps to clarify innovative lines of research, open problems and, most importantly, foster the exchanges of well-known results throughout different communities.

Semi-supervised Echo State Networks for Audio Classification

Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a p... more Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a powerful tool for the analysis of dynamic data. In an ESN, the input signal is fed to a fixed (possibly large) pool of interconnected neurons, whose state is then read by an adaptable layer to provide the output. This last layer is generally trained via a regularized linear least-squares procedure. In this paper, we consider the more complex problem of training an ESN for classification problems in a semi-supervised setting, wherein only a part of the input sequences are effectively labeled with the desired response. To solve the problem , we combine the standard ESN with a semi-supervised support vector machine (S 3 VM) for training its adaptable connections. Additionally, we propose a novel algorithm for solving the resulting non-convex optimization problem, hinging on a series of successive approximations of the original problem. The resulting procedure is highly cus-tomizable and also admits a principled way of parallelizing training over multiple processors/computers. An extensive set of experimental evaluations on audio classification tasks supports the presented semi-supervised ESN as a practical tool for dynamic problems requiring the analysis of partially labeled data.

Parallel and Distributed Training of Neural Networks via Successive Convex Approximation

The aim of this paper is to develop a theoretical framework for training neural network (NN) mode... more The aim of this paper is to develop a theoretical framework for training neural network (NN) models, when data is distributed over a set of agents that are connected to each other through a sparse network topology. The framework builds on a distributed convexification technique, while leverag-ing dynamic consensus to propagate the information over the network. It can be customized to work with different loss and regularization functions, typically used when training NN models, while guaranteeing provable convergence to a stationary solution under mild assumptions. Interestingly , it naturally leads to distributed architectures where agents solve local optimization problems exploiting parallel multi-core processors. Numerical results corroborate our theoretical findings, and assess the performance for parallel and distributed training of neural networks.

Distributed Spectral Clustering based on Euclidean Distance Matrix Completion

by Simone Scardapane and Rosa Altilio

In this paper, we consider the problem of distributed spectral clustering, wherein the data to be... more In this paper, we consider the problem of distributed spectral clustering, wherein the data to be clustered is (horizontally) partitioned over a set of interconnected agents with limited connectivity. In order to solve it, we consider the equivalent problem of reconstructing the Euclidean distance matrix of pairwise distances among the joint set of datapoints. This is obtained in a fully decentralized fashion, making use of an innovative distributed gradient-based procedure, where at every agent we interleave gradient steps on a low-rank factorization of the distance matrix, with local averaging steps considering all its neighbors' current estimates. The procedure can be applied to any spectral clustering algorithm, including normalized and unnormalized variations, for multiple choices of the underlying Laplacian matrix. Experimental evaluations demonstrate that the solution is competitive with a fully centralized solver, where data is collected beforehand on a (virtual) coordinating agent.

Diffusion Spline Adaptive Filtering

Diffusion adaptation (DA) algorithms allow a network of agents to collectively estimate a paramet... more Diffusion adaptation (DA) algorithms allow a network of agents to collectively estimate a parameter vector, by jointly minimizing the sum of their local cost functions. This is achieved by interleaving local update steps with 'diffusion' steps, where information is combined with their own neighbors. In this paper, we propose a novel class of nonlinear diffusion filters, based on the recently proposed spline adaptive filter (SAF). A SAF learns nonlinear models by local interpolating polynomials, with a small overhead with respect to linear filters. This arises from the fact that only a small subset of parameters of the nonlinear component are adapted at every time-instant. By applying ideas from the DA framework, in this paper we derive a diffused version of the SAF, denoted as D-SAF. Experimental evaluations show that the D-SAF is able to robustly learn the underlying nonlinear model, with a significant gain compared to a non-cooperative solution.

Fully Decentralized Semi-supervised Learning via Privacy-preserving Matrix Completion

by Simone Scardapane and Aurelio Uncini

Distributed learning refers to the problem of inferring a function when the training data are dis... more Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.

Distributed Reservoir Computing with Sparse Readouts

by Simone Scardapane and Aurelio Uncini

In a network of agents, a widespread problem is the need to estimate a common underlying function... more In a network of agents, a widespread problem is the need to estimate a common underlying function starting from locally distributed measurements. Real-world scenarios may not allow the presence of centralized fusion centers, requiring the development of distributed, message-passing implementations of the standard machine learning training algorithms. In this paper, we are concerned with the distributed training of a particular class of recurrent neural networks, namely echo state networks (ESNs). In the centralized case, ESNs have received considerable attention, due to the fact that they can be trained with standard linear regression routines. Based on this observation, in our previous work we have introduced a decentralized algorithm, framed in the distributed optimization field, in order to train an ESN. In this paper, we focus on an additional sparsity property of the output layer of ESNs, allowing for very efficient implementations of the resulting networks. In order to evaluate the proposed algorithm, we test it on two well-known prediction benchmarks, namely the Mackey-Glass chaotic time series and the 10th order nonlinear auto regressive moving average (NARMA) system.

Distributed semi-supervised support vector machines

The semi-supervised support vector machine (S3VM) is a well-known algorithm for performing semi-s... more The semi-supervised support vector machine (S3VM) is a well-known algorithm for performing semi-supervised inference under the large margin principle. In this paper, we are interested in the problem of training a S3VM when the labeled and unlabeled samples are distributed over a network of interconnected agents. In particular, the aim is to design a distributed training protocol over networks, where communication is restricted only to neighboring agents and no coordinating authority is present. Using a standard relaxation of the original S3VM, we formulate the training problem as the distributed minimization of a non-convex social cost function. To find a (stationary) solution in a distributed manner, we employ two different strategies: i) a distributed gradient descent algorithm; ii) a recently developed framework for
In-Network Nonconvex Optimization (NEXT), which is based on successive convexifications of the original problem, interleaved by state diffusion steps. Our experimental results show that the proposed distributed algorithms have comparable performance with respect to a centralized implementation, while highlighting the pros and cons of the proposed solutions. To the date, this is the first work that paves the way toward the broad field of distributed semi-supervised learning over networks.

Granular Computing Techniques for Classification and Semantic Characterization of Structured Data

by Filippo Maria Bianchi, Simone Scardapane, Aurelio Uncini, and Antonello Rizzi

We propose a system able to synthesize automatically a classification model and a set of interpre... more We propose a system able to synthesize automatically a classification model and a set of interpretable decision rules defined over a set of symbols, corresponding to frequent substructures of the input dataset. Given a preprocessing procedure which maps every input element into a fully labeled graph, the system solves the classification problem in the graph domain. The extracted rules are then able to characterize semantically the classes of the problem at hand. The structured data that we consider in this paper are images coming from classification datasets: they represent an effective proving ground for studying the ability of the system to extract interpretable classification rules. For this particular input domain, the preprocessing procedure is based on a flexible segmentation algorithm whose behavior is defined by a set of parameters. The core inference engine uses a parametric graph edit dissimilarity measure. A genetic algorithm is in charge of selecting suitable values for the parameters, in order to synthesize a classification model based on interpretable rules which maximize the generalization capability of the model. Decision rules are defined over a set of information granules in the graph domain, identified by a frequent substructures miner. We compare the system with two other state-of-the-art graph classifiers, evidencing both its main strengths and limits.

Distributed Music Classification Using Random Vector Functional-Link Nets

by Simone Scardapane and Aurelio Uncini

In this paper, we investigate the problem of music classification when training data is distribut... more In this paper, we investigate the problem of music classification when training data is distributed throughout a network of interconnected agents (e.g. computers, or mobile devices), and it is available in a sequential stream. Under the considered setting, the task is for all the nodes, after receiving any new chunk of training data, to agree on a single classifier in a decentralized fashion, without reliance on a master node. In particular, in this paper we propose a fully decentralized, sequential learning algorithm for a class of neural networks known as Random Vector Functional-Link nets. The proposed algorithm does not require the presence of a single coordinating agent, and it is formulated exclusively in term of local exchanges between neighboring nodes, thus making it useful in a wide range of realistic situations. Experimental simulations on four music classification benchmarks show that the algorithm has comparable performance with respect to a centralized solution, where a single agent collects all the local data from every node and subsequently updates the model.

Functional Link Expansions for Nonlinear Modeling of Audio and Speech Signals

by Simone Scardapane and Michele Scarpiniti

Nonlinear distortions pose a serious problem for the quality preservation of audio and speech sig... more Nonlinear distortions pose a serious problem for the quality preservation of audio and speech signals. To address this problem, such signals are processed by nonlinear models. Functional link adaptive filter (FLAF) is a Iinear-in-the-parameter
nonlinear model, whose nonlinear transformation of the input is characterized by a basis function expansion, satisfying the universal approximation properties. Since the expansion type affects the nonlinear modeling according to the nature of the input signal, in this paper we investigate the FLAF modeling performance involving the most popular functional expansions when audio and speech signals are processed. A comprehensive analysis is conducted to provide the best suitable solution for the processing of nonlinear signals. Experimental results are assessed also in terms of signal quality and intelligibility.

Prediction of telephone calls load using Echo State Network with exogenous variables

by Filippo Maria Bianchi, Simone Scardapane, and Antonello Rizzi

Neural Networks

We approach the problem of forecasting the load of incoming calls in a cell of a mobile network u... more We approach the problem of forecasting the load of incoming calls in a cell of a mobile network using Echo State Networks. With respect to previous approaches to the problem, we consider the inclusion of additional telephone records regarding the activity registered in the cell as exogenous variables, by investigating their usefulness in the forecasting task. Additionally, we analyze different methodologies for training the readout of the network, including two novel variants, namely -SVR and an elastic net penalty. Finally, we employ a genetic algorithm for both the tasks of tuning the parameters of the system and for selecting the optimal subset of most informative additional timeseries to be considered as external inputs in the forecasting problem. We compare the performances with standard prediction models and we valuate
the results according to the specific properties of the considered time-series.

A decentralized training algorithm for Echo State Networks in distributed big data applications

by Dianhui Wang and Simone Scardapane

The current big data deluge requires innovative solutions for performing efficient inference on l... more The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feedforward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges
between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully
centralized implementation, in terms of speed, efficiency and generalization accuracy.

A semi-supervised random vector functional-link network based on the transductive framework

by Simone Scardapane and Michele Scarpiniti

Semi-supervised learning (SSL) is the problem of learning a function with only a partially labele... more Semi-supervised learning (SSL) is the problem of learning a function with only a partially labeled training set. It has considerable practical interest in applications where labeled data is costly to obtain, while unlabeled data is abundant. One approach to SSL in the case of binary classification is inspired by work on transductive learning (TL) by V. Vapnik. It has been applied prevalently using support vector machines (SVM) as the base learning algorithm, giving rise to the so-called transductive SVM (TR-SVM). The resulting optimization problem, however, is highly non-convex and complex to solve. In this paper, we propose an alternative semi-supervised training algorithm based on the TL theory, namely semi-supervised random vector functional-link (RVFL) network, which is able to obtain state-of-the-art performance, while resulting in a standard convex optimization problem. In particular we show that, thanks to the characteristics of RVFLs networks, the resulting optimization problem can be safely approximated with a standard quadratic programming problem solvable in polynomial time. A wide range of experiments validate our proposal. As a comparison, we also propose a semi-supervised algorithm for RVFLs based on the theory of manifold regularization.

Microphone array based classification for security monitoring inunstructured environments

by Michele Scarpiniti, Simone Scardapane, Marta Bucciarelli, and Marcello Mansueto

The aim of this paper is to describe a novel security system able to localize and classify audio ... more The aim of this paper is to describe a novel security system able to localize and classify audio sources inan outdoor environment. Its primary intended use is for security monitoring in severe scenarios, and ithas been designed to cope with a large set of heterogeneous objects, including weapons, human speakersand vehicles. The system is the result of a research project sponsored by the Italian Ministry of Defense.It is composed of a large squared array of 864 microphones arranged in a rectangular lattice, whose inputis processed using a classical delay-and-sum beamformer. The result of this localization process is elab-orated by a complex multi-level classification system designed in a modular fashion. In this paper, afterpresenting the details of the system’s design, with a particular emphasis on the innovative aspects thatare introduced with respect to the state-of-the-art, we provide an extensive set of simulations show-ing the effectiveness of the proposed architecture. We conclude by describing the current limits of thesystem, and the projected further developments.

Sparse Functional Link Adaptive Filter Using an ℓ 1 -Norm Regularization

by Simone Scardapane and Aurelio Uncini

Linear-in-the-parameters nonlinear adaptive filters often show some sparse behavior due to the fa... more Linear-in-the-parameters nonlinear adaptive filters often show some sparse behavior due to the fact that not all the coefficients are equally useful for the modeling of any nonlin-earity. Recently, proportionate algorithms have been proposed to leverage sparsity behaviors in nonlinear filtering. In this paper, we deal with this problem by introducing a proportionate adap-tive algorithm based on an ℓ1-norm penalty of the cost function, which regularizes the solution, to be used for a class of nonlinear filters based on functional links. The proposed algorithm stresses the difference between useful and useless functional links for the purpose of nonlinear modeling. Experimental results clearly show faster convergence performance with respect to the standard (i.e., non-regularized) version of the algorithm.

In Codice Ratio: OCR of Handwritten Latin Documents using Deep Convolutional Networks

Automatic transcription of historical handwritten documents is a challenging research problem, re... more Automatic transcription of historical handwritten documents is a challenging research problem, requiring in general expensive transcriptions from expert paleographers. In Codice Ratio is designed to be an end-to-end architecture requiring instead limited labeling effort, whose aim is the automatic transcription of a portion of the Vatican Secret Archives (one of the largest historical libraries in the world). In this paper, we describe in particular the design of our OCR component for Latin characters. To this end, we first annotated a large corpus of Latin characters with a custom crowdsourcing platform. Leveraging over recent progresses in deep learning, we designed and trained a deep con-volutional network achieving an overall accuracy of 96% over the entire dataset, which is one of the highest results reported in the literature so far. Our training data are publicly available.

On the Use of Deep Recurrent Neural Networks for Detecting Audio Spoofing Attacks

by Simone Scardapane and Aurelio Uncini

Biometric security systems based on predefined speech sentences are extremely common nowadays, pa... more Biometric security systems based on predefined speech sentences are extremely common nowadays, particularly in low-cost applications where the simplicity of the hardware involved is a great advantage. Audio spoofing verification is the problem of detecting whether a speech segment acquired from such a system is genuine, or whether it was synthesized or modified by a computer in order to make it sound like an authorized person. Developing countermeasures for spoofing attacks is clearly essential for having effective biometric and security systems based on audio features, all the more significant due to recent advances in generative machine learning. Nonetheless, the problem is complicated by the possible lack of knowledge on the technique(s) used to put forward the attack, so that anti-spoofing systems should be able to withstand also spoofing attacks that were not considered explicitly in the training stage. In this paper, we analyze the use of deep recurrent networks applied to this task, i.e. networks made by the successive combination of multiple feedforward and recurrent layers. These networks are routinely used in speech recognition and language identification but, to the best of our knowledge, they were never considered for this specific problem. We evaluate several architectures on the dataset released for the ASVspoof 2015 challenge last year. We show that, by working with very standard feature extraction routines and with a minimum amount of fine-tuning, the networks can already reach very promising error rates, comparable to state-of-the-art approaches, paving the way to further investigations on the problem using deep RNN models.

Advances in Biologically Inspired Reservoir Computing

by Simone Scardapane, J. Butcher, and Filippo Maria Bianchi

The interplay between randomness and optimization has always been a major theme in the design of ... more The interplay between randomness and optimization has always been a major theme in the design of neural networks [3]. In the last 15 years, the success of reservoir computing (RC) showed that, in many scenarios, the algebraic structure of the recurrent component is far more important than the precise fine-tuning of its weights. As long as the recurrent part of the network possesses a form of fading memory of the input, the dynamics of the neurons are enough to efficiently process many spatio-temporal signals, provided that their activations are sufficiently heterogeneous. Even if today it is feasible to fully optimize deep recurrent networks , their implementation still requires a vast degree of experience and practice, not to mention vast computational resources, limiting their applicability in simpler architec-tures (e.g., embedded systems) or in areas where time is of key importance (e.g., online systems). Not surprisingly, then, RC remains a powerful tool for quickly solving dynamical problems, and it has become an invaluable tool for modeling and analysis in neuroscience. Ten years after the last special issue entirely dedicated to the topic [2], this issue aims at providing an up-to-date overview on (some of) the latest developments in the field. Recently, Goudarzi and Teuscher listed a series of 11 questions that will drive research in RC from here forward [1]. Although we cannot cover all of them in a single issue, many of these questions are addressed in the articles that compose the issue, which we believe provides a good overview on the diversity and the vitality of the field. Overall, we hope the issue to be of interest to the readers of Cognitive Computation. In particular, we selected ten papers to appear in this special issue. All of them have gone through at least two rounds of revision by two to four expert reviewers. One paper, coauthored by one of the guest editors, underwent an independent review process to guarantee fairness. The articles are logically organized in three separate parts. The first third of the issue is dedicated to the study of delay-line architectures, which have recently been inspired by the possibility of implementation on non-conventional computing architectures, most notably photonic computers. The second part of the issue investigates some theoretical aspects of RC models, and the third part is devoted to innovative formulations for designing architectures for learning and recognition tasks. The first four papers of the special issue are dedicated to photonic RC and time-delay architectures: – In 'Online training for high-performance analogue readout layers in photonic reservoir computers', Antonik et al. propose the use of online training algorithms when exploiting analogue readouts in photonic RC. Their simulated experiments show that online algorithms can

A Framework for parallel and distributed training of neural networks

by Simone Scardapane and Paolo Lorenzo

The aim of this paper is to develop a general framework for training neural networks (NNs) in a d... more The aim of this paper is to develop a general framework for training neural networks (NNs) in a distributed environment, where
training data is partitioned over a set of agents that communicate with each other through a sparse, possibly time-varying, connectivity pattern. In such distributed scenario, the training problem can be formulated as the (regularized) optimization of a non-convex social cost function, given by the sum of local (non-convex) costs, where each agent contributes with a single error term defined with respect to its local dataset. To devise a flexible and efficient solution, we customize a recently proposed framework for non-convex optimization over networks, which hinges on a (primal) convexification-decomposition technique to handle non-convexity, and a dynamic consensus procedure to diffuse information among the agents. Several typical choices for the training criterion (e.g., squared loss, cross entropy, etc.) and regularization (e.g., l2 norm, sparsity inducing penalties, etc.) are included in the framework and explored along the paper. Convergence to a stationary solution of the social non-convex problem is guaranteed under mild assumptions. Additionally, we show a principled way allowing each agent to exploit a possible multi-core architecture (e.g., a local cloud) in order to parallelize its local optimization step, resulting in strategies that are both distributed (across the agents) and parallel (inside each agent) in nature. A comprehensive set of experimental results validate the proposed approach.

Randomness in Neural Networks: An Overview

by Simone Scardapane and Dianhui Wang

Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data... more Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data to build feature-based classifiers and nonlinear predictive models. Training neu-ral networks involves the optimization of non-convex objective functions, and usually the learning process is costly and infeasible for applications associated with data streams. A possible, albeit counter-intuitive alternative is to randomly assign a subset of the networks' weights, so that the resulting optimization task can be formulated as a linear least-squares problem. This methodology can be applied to both feedforward and recurrent networks, and similar techniques can be used to approximate kernel functions. Many experimental results indicate that such randomized models can reach sound performance compared to fully adaptable ones, with a number of favourable benefits, including (i) simplicity of implementation , (ii) faster learning with less intervention from human beings, and (iii) possibility of leveraging over all linear regression and classification algorithms (e.g., l1 norm minimization for obtaining sparse formulations). All these points make them attractive and valuable to the data mining community, particularly for handling large scale data mining in real-time. However, the literature in the field is extremely vast and fragmented, with many results being reintroduced multiple times under different names. This overview aims at providing a self-contained, uniform introduction to the different ways in which randomization can be applied to the design of neural networks and kernel functions. A clear exposition of the basic framework underlying all these approaches helps to clarify innovative lines of research, open problems and, most importantly, foster the exchanges of well-known results throughout different communities.

Semi-supervised Echo State Networks for Audio Classification

Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a p... more Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a powerful tool for the analysis of dynamic data. In an ESN, the input signal is fed to a fixed (possibly large) pool of interconnected neurons, whose state is then read by an adaptable layer to provide the output. This last layer is generally trained via a regularized linear least-squares procedure. In this paper, we consider the more complex problem of training an ESN for classification problems in a semi-supervised setting, wherein only a part of the input sequences are effectively labeled with the desired response. To solve the problem , we combine the standard ESN with a semi-supervised support vector machine (S 3 VM) for training its adaptable connections. Additionally, we propose a novel algorithm for solving the resulting non-convex optimization problem, hinging on a series of successive approximations of the original problem. The resulting procedure is highly cus-tomizable and also admits a principled way of parallelizing training over multiple processors/computers. An extensive set of experimental evaluations on audio classification tasks supports the presented semi-supervised ESN as a practical tool for dynamic problems requiring the analysis of partially labeled data.

Parallel and Distributed Training of Neural Networks via Successive Convex Approximation

The aim of this paper is to develop a theoretical framework for training neural network (NN) mode... more The aim of this paper is to develop a theoretical framework for training neural network (NN) models, when data is distributed over a set of agents that are connected to each other through a sparse network topology. The framework builds on a distributed convexification technique, while leverag-ing dynamic consensus to propagate the information over the network. It can be customized to work with different loss and regularization functions, typically used when training NN models, while guaranteeing provable convergence to a stationary solution under mild assumptions. Interestingly , it naturally leads to distributed architectures where agents solve local optimization problems exploiting parallel multi-core processors. Numerical results corroborate our theoretical findings, and assess the performance for parallel and distributed training of neural networks.

Distributed Spectral Clustering based on Euclidean Distance Matrix Completion

by Simone Scardapane and Rosa Altilio

In this paper, we consider the problem of distributed spectral clustering, wherein the data to be... more In this paper, we consider the problem of distributed spectral clustering, wherein the data to be clustered is (horizontally) partitioned over a set of interconnected agents with limited connectivity. In order to solve it, we consider the equivalent problem of reconstructing the Euclidean distance matrix of pairwise distances among the joint set of datapoints. This is obtained in a fully decentralized fashion, making use of an innovative distributed gradient-based procedure, where at every agent we interleave gradient steps on a low-rank factorization of the distance matrix, with local averaging steps considering all its neighbors' current estimates. The procedure can be applied to any spectral clustering algorithm, including normalized and unnormalized variations, for multiple choices of the underlying Laplacian matrix. Experimental evaluations demonstrate that the solution is competitive with a fully centralized solver, where data is collected beforehand on a (virtual) coordinating agent.

Diffusion Spline Adaptive Filtering

Diffusion adaptation (DA) algorithms allow a network of agents to collectively estimate a paramet... more Diffusion adaptation (DA) algorithms allow a network of agents to collectively estimate a parameter vector, by jointly minimizing the sum of their local cost functions. This is achieved by interleaving local update steps with 'diffusion' steps, where information is combined with their own neighbors. In this paper, we propose a novel class of nonlinear diffusion filters, based on the recently proposed spline adaptive filter (SAF). A SAF learns nonlinear models by local interpolating polynomials, with a small overhead with respect to linear filters. This arises from the fact that only a small subset of parameters of the nonlinear component are adapted at every time-instant. By applying ideas from the DA framework, in this paper we derive a diffused version of the SAF, denoted as D-SAF. Experimental evaluations show that the D-SAF is able to robustly learn the underlying nonlinear model, with a significant gain compared to a non-cooperative solution.

Fully Decentralized Semi-supervised Learning via Privacy-preserving Matrix Completion

by Simone Scardapane and Aurelio Uncini

Distributed learning refers to the problem of inferring a function when the training data are dis... more Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.

Distributed Reservoir Computing with Sparse Readouts

by Simone Scardapane and Aurelio Uncini

In a network of agents, a widespread problem is the need to estimate a common underlying function... more In a network of agents, a widespread problem is the need to estimate a common underlying function starting from locally distributed measurements. Real-world scenarios may not allow the presence of centralized fusion centers, requiring the development of distributed, message-passing implementations of the standard machine learning training algorithms. In this paper, we are concerned with the distributed training of a particular class of recurrent neural networks, namely echo state networks (ESNs). In the centralized case, ESNs have received considerable attention, due to the fact that they can be trained with standard linear regression routines. Based on this observation, in our previous work we have introduced a decentralized algorithm, framed in the distributed optimization field, in order to train an ESN. In this paper, we focus on an additional sparsity property of the output layer of ESNs, allowing for very efficient implementations of the resulting networks. In order to evaluate the proposed algorithm, we test it on two well-known prediction benchmarks, namely the Mackey-Glass chaotic time series and the 10th order nonlinear auto regressive moving average (NARMA) system.

Distributed semi-supervised support vector machines

The semi-supervised support vector machine (S3VM) is a well-known algorithm for performing semi-s... more The semi-supervised support vector machine (S3VM) is a well-known algorithm for performing semi-supervised inference under the large margin principle. In this paper, we are interested in the problem of training a S3VM when the labeled and unlabeled samples are distributed over a network of interconnected agents. In particular, the aim is to design a distributed training protocol over networks, where communication is restricted only to neighboring agents and no coordinating authority is present. Using a standard relaxation of the original S3VM, we formulate the training problem as the distributed minimization of a non-convex social cost function. To find a (stationary) solution in a distributed manner, we employ two different strategies: i) a distributed gradient descent algorithm; ii) a recently developed framework for
In-Network Nonconvex Optimization (NEXT), which is based on successive convexifications of the original problem, interleaved by state diffusion steps. Our experimental results show that the proposed distributed algorithms have comparable performance with respect to a centralized implementation, while highlighting the pros and cons of the proposed solutions. To the date, this is the first work that paves the way toward the broad field of distributed semi-supervised learning over networks.

Granular Computing Techniques for Classification and Semantic Characterization of Structured Data

by Filippo Maria Bianchi, Simone Scardapane, Aurelio Uncini, and Antonello Rizzi

We propose a system able to synthesize automatically a classification model and a set of interpre... more We propose a system able to synthesize automatically a classification model and a set of interpretable decision rules defined over a set of symbols, corresponding to frequent substructures of the input dataset. Given a preprocessing procedure which maps every input element into a fully labeled graph, the system solves the classification problem in the graph domain. The extracted rules are then able to characterize semantically the classes of the problem at hand. The structured data that we consider in this paper are images coming from classification datasets: they represent an effective proving ground for studying the ability of the system to extract interpretable classification rules. For this particular input domain, the preprocessing procedure is based on a flexible segmentation algorithm whose behavior is defined by a set of parameters. The core inference engine uses a parametric graph edit dissimilarity measure. A genetic algorithm is in charge of selecting suitable values for the parameters, in order to synthesize a classification model based on interpretable rules which maximize the generalization capability of the model. Decision rules are defined over a set of information granules in the graph domain, identified by a frequent substructures miner. We compare the system with two other state-of-the-art graph classifiers, evidencing both its main strengths and limits.

Distributed Music Classification Using Random Vector Functional-Link Nets

by Simone Scardapane and Aurelio Uncini

In this paper, we investigate the problem of music classification when training data is distribut... more In this paper, we investigate the problem of music classification when training data is distributed throughout a network of interconnected agents (e.g. computers, or mobile devices), and it is available in a sequential stream. Under the considered setting, the task is for all the nodes, after receiving any new chunk of training data, to agree on a single classifier in a decentralized fashion, without reliance on a master node. In particular, in this paper we propose a fully decentralized, sequential learning algorithm for a class of neural networks known as Random Vector Functional-Link nets. The proposed algorithm does not require the presence of a single coordinating agent, and it is formulated exclusively in term of local exchanges between neighboring nodes, thus making it useful in a wide range of realistic situations. Experimental simulations on four music classification benchmarks show that the algorithm has comparable performance with respect to a centralized solution, where a single agent collects all the local data from every node and subsequently updates the model.

Functional Link Expansions for Nonlinear Modeling of Audio and Speech Signals

by Simone Scardapane and Michele Scarpiniti

Nonlinear distortions pose a serious problem for the quality preservation of audio and speech sig... more Nonlinear distortions pose a serious problem for the quality preservation of audio and speech signals. To address this problem, such signals are processed by nonlinear models. Functional link adaptive filter (FLAF) is a Iinear-in-the-parameter
nonlinear model, whose nonlinear transformation of the input is characterized by a basis function expansion, satisfying the universal approximation properties. Since the expansion type affects the nonlinear modeling according to the nature of the input signal, in this paper we investigate the FLAF modeling performance involving the most popular functional expansions when audio and speech signals are processed. A comprehensive analysis is conducted to provide the best suitable solution for the processing of nonlinear signals. Experimental results are assessed also in terms of signal quality and intelligibility.

Prediction of telephone calls load using Echo State Network with exogenous variables

by Filippo Maria Bianchi, Simone Scardapane, and Antonello Rizzi

Neural Networks

We approach the problem of forecasting the load of incoming calls in a cell of a mobile network u... more We approach the problem of forecasting the load of incoming calls in a cell of a mobile network using Echo State Networks. With respect to previous approaches to the problem, we consider the inclusion of additional telephone records regarding the activity registered in the cell as exogenous variables, by investigating their usefulness in the forecasting task. Additionally, we analyze different methodologies for training the readout of the network, including two novel variants, namely -SVR and an elastic net penalty. Finally, we employ a genetic algorithm for both the tasks of tuning the parameters of the system and for selecting the optimal subset of most informative additional timeseries to be considered as external inputs in the forecasting problem. We compare the performances with standard prediction models and we valuate
the results according to the specific properties of the considered time-series.

A decentralized training algorithm for Echo State Networks in distributed big data applications

by Dianhui Wang and Simone Scardapane

The current big data deluge requires innovative solutions for performing efficient inference on l... more The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feedforward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges
between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully
centralized implementation, in terms of speed, efficiency and generalization accuracy.

A semi-supervised random vector functional-link network based on the transductive framework

by Simone Scardapane and Michele Scarpiniti

Semi-supervised learning (SSL) is the problem of learning a function with only a partially labele... more Semi-supervised learning (SSL) is the problem of learning a function with only a partially labeled training set. It has considerable practical interest in applications where labeled data is costly to obtain, while unlabeled data is abundant. One approach to SSL in the case of binary classification is inspired by work on transductive learning (TL) by V. Vapnik. It has been applied prevalently using support vector machines (SVM) as the base learning algorithm, giving rise to the so-called transductive SVM (TR-SVM). The resulting optimization problem, however, is highly non-convex and complex to solve. In this paper, we propose an alternative semi-supervised training algorithm based on the TL theory, namely semi-supervised random vector functional-link (RVFL) network, which is able to obtain state-of-the-art performance, while resulting in a standard convex optimization problem. In particular we show that, thanks to the characteristics of RVFLs networks, the resulting optimization problem can be safely approximated with a standard quadratic programming problem solvable in polynomial time. A wide range of experiments validate our proposal. As a comparison, we also propose a semi-supervised algorithm for RVFLs based on the theory of manifold regularization.

Microphone array based classification for security monitoring inunstructured environments

by Michele Scarpiniti, Simone Scardapane, Marta Bucciarelli, and Marcello Mansueto

The aim of this paper is to describe a novel security system able to localize and classify audio ... more The aim of this paper is to describe a novel security system able to localize and classify audio sources inan outdoor environment. Its primary intended use is for security monitoring in severe scenarios, and ithas been designed to cope with a large set of heterogeneous objects, including weapons, human speakersand vehicles. The system is the result of a research project sponsored by the Italian Ministry of Defense.It is composed of a large squared array of 864 microphones arranged in a rectangular lattice, whose inputis processed using a classical delay-and-sum beamformer. The result of this localization process is elab-orated by a complex multi-level classification system designed in a modular fashion. In this paper, afterpresenting the details of the system’s design, with a particular emphasis on the innovative aspects thatare introduced with respect to the state-of-the-art, we provide an extensive set of simulations show-ing the effectiveness of the proposed architecture. We conclude by describing the current limits of thesystem, and the projected further developments.

Benchmarking Functional Link Expansions for Audio Classification Tasks

by Simone Scardapane and Aurelio Uncini

Functional Link Artificial Neural Networks (FLANNs) have been extensively used for tasks of audio... more Functional Link Artificial Neural Networks (FLANNs) have been extensively used for tasks of audio and speech classification, due to their combination of universal approximation capabilities and fast training. The performance of a FLANN, however, is known to be dependent on the specific functional link (FL) expansion that is used. In this paper, we provide an extensive benchmark of multiple FL expansions on several audio classification problems, including speech discrimination , genre classification, and artist recognition. Our experimental results show that a random-vector expansion is well suited for classification tasks, achieving the best accuracy in two out of three tasks.

A Comparison of Consensus Strategies for Distributed Learning of Random Vector Functional-Link Networks

Distributed machine learning is the problem of inferring a desired relation when the training dat... more Distributed machine learning is the problem of inferring a desired relation when the training data is distributed throughout a network of agents (e.g. robots in a robot swarm). Multiple families of distributed learning algorithms are based on the decentralized average consensus (DAC) protocol, an efficient algorithm for computing an average starting from local measurement vectors. The performance of DAC, however, is strongly dependent on the choice of a weighting matrix associated to the network. In this paper, we perform a comparative analysis of the relative performance of 4 different strategies for choosing the weighting matrix. As an applica-tive example, we consider the distributed sequential algorithm for Random Vector Functional-Link networks. As expected, our experimental simulations show that the training time required by the algorithm is drastically reduced when considering a proper initialization of the weights.

A Preliminary Study on Transductive Extreme Learning Machines

Transductive learning is the problem of designing learning machines that succesfully generalize o... more Transductive learning is the problem of designing learning machines that succesfully generalize only on a given set of input patterns. In this paper we begin the study towards the extension of Extreme Learning Machine (ELM) theory to the transductive setting, focusing on the binary classification case. To this end, we analyze previous work on Transductive Support Vector Machines (TSVM) learning, and introduce the Transductive ELM (TELM) model. Contrary to TSVM, we show that the optimization of TELM results in a purely combinatorial search over the unknown labels. Some preliminary results on an artifical dataset show improvements with respect to a standard ELM when the labeled dataset is significantly small.

PM10 Forecasting using Kernel Adaptive Filtering: an Italian Case Study

Short term prediction of air pollution is gaining increasing attention in the research community,... more Short term prediction of air pollution is gaining increasing attention in the research community, due to its social and economical impact. In this paper we study the application of a Kernel Adaptive Filtering (KAF) algorithm to the problem of predicting PM10 data in the Italian province of Ancona, and we show how this predictor is able to achieve a significant low error with the inclusion of chemical data correlated with the PM10 such as NO2.

User-Driven Quality Enhancement for Audio Signal Processing

Classical methods for audio and speech enhancement are often based on error-driven optimization... more Classical methods for audio and speech enhancement are often based on error-driven optimization strategies, such as the mean-square error minimization. However, these approaches do not always satisfy the quality requirements demanded by users of the system. In order to meet subjective specications, we put forward the idea of a user-driven approach to audio enhancement through the inclusion in the optimization stage of an interactive evolutionary algorithm (IEA). In this way, performance of the system can be adapted to any user in a principled and systematic way, thus reflecting the desired subjective quality. Experiments in the context of echo cancellation support the proposed methodology, showing signicant statistical advantage of the proposed framework with respect to classical approaches.

Distributed supervised learning using neural networks

Distributed learning is the problem of inferring a function in the case where training data is di... more Distributed learning is the problem of inferring a function in the case where training data is distributed among multiple geographically separated sources. Particularly, the focus is on designing learning strategies with low computational requirements, in which communication is restricted only to neighboring agents, with no reliance on a centralized authority. In this thesis, we analyze multiple distributed protocols for a large number of neural network architectures. The first part of the thesis is devoted to a definition of the problem, followed by an extensive overview of the state-of-the-art. Next, we introduce different strategies for a relatively simple class of single layer neural networks, where a linear output layer is preceded by a nonlinear layer, whose weights are stochastically assigned in the beginning of the learning process. We consider both batch and sequential learning, with horizontally and vertically partitioned data. In the third part, we consider instead the more complex problem of semi-supervised distributed learning, where each agent is provided with an additional set of unlabeled training samples. We propose two different algorithms based on diffusion processes for linear support vector machines and kernel ridge regression. Subsequently, the fourth part extends the discussion to learning with time-varying data (e.g. time-series) using recurrent neural networks. We consider two different families of networks, namely echo state networks (extending the algorithms introduced in the second part), and spline adaptive filters. Overall, the algorithms presented throughout the thesis cover a wide range of possible practical applications, and lead the way to numerous future extensions, which are briefly summarized in the conclusive chapter.