Papers by Stephen Roberts

This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquit... more This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight. We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time. Significantly, 18 hours of recordings contain annotations from 36 different species. Mosquitoes are well-known carriers of diseases such as malaria, dengue and yellow fever. Collecting this dataset is motivated by the need to assist applications which utilise mosquito acoustics to conduct surveys to help predict outbreaks and inform intervention policy. The task of detecting mosquitoes from the sound of their wingbeats is challenging due to the difficulty in collecting recordings from realistic scenarios. To address this, as part of the HumBug project, we conducted global experiments to record mosquitoes ranging from those bred in culture cages to mosquitoes captured in the wild. Consequently, the audio recordings vary in signal-to-noise ratio and contain a broad range of indoor and outdoor background environments from Tanzania, Thailand, Kenya, the USA and the UK. In this paper we describe in detail how we collected, labelled and curated the data. The data is provided from a PostgreSQL database, which contains important metadata such as the capture method, age, feeding status and gender of the mosquitoes. Additionally, we provide code to extract features and train Bayesian convolutional neural networks for two key tasks: the identification of mosquitoes from their corresponding background environments, and the classification of detected mosquitoes into species. Our extensive dataset is both challenging to machine learning researchers focusing on acoustic identification, and critical to entomologists, geo-spatial modellers and other domain experts to understand mosquito behaviour, model their distribution, and manage the threat they pose to humans. 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks.

We propose a novel sampling framework for inference in probabilistic models: an active learning a... more We propose a novel sampling framework for inference in probabilistic models: an active learning approach that converges more quickly (in wall-clock time) than Markov chain Monte Carlo (MCMC) benchmarks. The central challenge in proba-bilistic inference is numerical integration, to average over ensembles of models or unknown (hyper-)parameters (for example to compute the marginal likelihood or a partition function). MCMC has provided approaches to numerical integration that deliver state-of-the-art inference, but can suffer from sample inefficiency and poor convergence diagnostics. Bayesian quadrature techniques offer a model-based solution to such problems, but their uptake has been hindered by prohibitive com-putation costs. We introduce a warped model for probabilistic integrands (like-lihoods) that are known to be non-negative, permitting a cheap active learning scheme to optimally select sample locations. Our algorithm is demonstrated to offer faster convergence (in seconds) rel...

ArXiv, 2020
Efficient optimisation of black-box problems that comprise both continuous and categorical inputs... more Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges. We propose a new approach, Continuous and Categorical Bayesian Optimisation (CoCaBO), which combines the strengths of multi-armed bandits and Bayesian optimisation to select values for both categorical and continuous inputs. We model this mixed-type space using a Gaussian Process kernel, designed to allow sharing of information across multiple categorical variables, each with multiple possible values; this allows CoCaBO to leverage all available data efficiently. We extend our method to the batch setting and propose an efficient selection procedure that dynamically balances exploration and exploitation whilst encouraging batch diversity. We demonstrate empirically that our method outperforms existing approaches on both synthetic and real-world optimisation tasks with continuous and categorical inputs.

Visual inspection of neurons suggests that dendritic orientation may be determined both by intern... more Visual inspection of neurons suggests that dendritic orientation may be determined both by internal constraints (e.g. membrane tension) and by external vector fields (e.g. neurotrophic gradients). For example, basal dendrites of pyramidal cells appear nicely fan-out. This regular orientation is hard to justify completel y with a general tendency to grow straight, given the zigzags observed experimentally. Instead, dendrites could (A) favor a fixed ("external") direction, or (B) repel from their own soma. To investigate these possibilities quantitatively, reconstructed hippocampal cells were subjected to Bayesian analysis. The statistical model combined linearly fact ors A and B, as well as the tendency to grow straight. For all morphological classes, B was found to be significantly positive and consistently greater than A. In addition, when dendrites were artificially re-oriented according to this model, the resulting structures closely resembled real morphologies. These r...
ArXiv, 2021
Marginalising over families of Gaussian Process kernels produces flexible model classes with well... more Marginalising over families of Gaussian Process kernels produces flexible model classes with well-calibrated uncertainty estimates. Existing approaches require likelihood evaluations of many kernels, rendering them prohibitively expensive for larger datasets. We propose a Bayesian Quadrature scheme to make this marginalisation more efficient and thereby more practical. Through use of the maximum mean discrepancies between distributions, we define a kernel over kernels that captures invariances between Spectral Mixture (SM) Kernels. Kernel samples are selected by generalising an information-theoretic acquisition function for warped Bayesian Quadrature. We show that our framework achieves more accurate predictions with better calibrated uncertainty than state-of-the-art baselines, especially when given limited (wall-clock) time budgets.

ArXiv, 2018
Time series forecasting is ubiquitous in the modern world. Applications range from health care to... more Time series forecasting is ubiquitous in the modern world. Applications range from health care to astronomy, include climate modelling, financial trading and monitoring of critical engineering equipment. To offer value over this range of activities we must have models that not only provide accurate forecasts but that also quantify and adjust their uncertainty over time. Furthermore, such models must allow for multimodal, non-Gaussian behaviour that arises regularly in applied settings. In this work, we propose a novel, end-to-end deep learning method for time series forecasting. Crucially, our model allows the principled assessment of predictive uncertainty as well as providing rich information regarding multiple modes of future data values. Our approach not only provides an excellent predictive forecast, shadowing true future values, but also allows us to infer valuable information, such as the predictive distribution of the occurrence of critical events of interest, accurately and...

Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks... more Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally incompatible with each other and thus cannot be learnt by a single policy. This can occur, in reinforcement learning (RL) when an agent may be rewarded for achieving different goals from the same observation. In this paper we formalize this “interference” as distinct from the problem of forgetting. We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference. Instead, we propose a simple method, OWL, to address this challenge. OWL learns a factorized policy, using shared feature extraction layers, but separate heads,...
We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Thoug... more We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it is a form of Tikhonov regularisation which penalises functions with high-frequency components in the Fourier domain. We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins and that the explicit regulariser we derive is able to reproduce these effects.
Entropy, 2019
Efficient approximation lies at the heart of large-scale machine learning problems. In this paper... more Efficient approximation lies at the heart of large-scale machine learning problems. In this paper, we propose a novel, robust maximum entropy algorithm, which is capable of dealing with hundreds of moments and allows for computationally efficient approximations. We showcase the usefulness of the proposed method, its equivalence to constrained Bayesian variational inference and demonstrate its superiority over existing approaches in two applications, namely, fast log determinant estimation and information-theoretic Bayesian optimisation.
SSRN Electronic Journal, 2017
SSRN Electronic Journal, 2018
The change subsequent to the sub-prime crisis pushed pressure on decreased financial products com... more The change subsequent to the sub-prime crisis pushed pressure on decreased financial products complexity, going from exotics to vanilla options but increase in pricing efficiency. We introduce in this paper a more efficient methodology for vanilla option pricing using a scenario based particle filter in a hostile data environment. In doing so we capitalize on the risk factor decomposition of the the Implied Volatility surface Parametrization (IVP) recently introduced [70] in order to define our likelihood function and therefore our sampling methodology taking into consideration arbitrage constraints.

Journal of Artificial Intelligence Research, 2008
In this paper, we elucidate the equivalence between inference in game theory and machine learning... more In this paper, we elucidate the equivalence between inference in game theory and machine learning. Our aim in so doing is to establish an equivalent vocabulary between the two domains so as to facilitate developments at the intersection of both fields, and as proof of the usefulness of this approach, we use recent developments in each field to make useful improvements to the other. More specifically, we consider the analogies between smooth best responses in fictitious play and Bayesian inference methods. Initially, we use these insights to develop and demonstrate an improved algorithm for learning in games based on probabilistic moderation. That is, by integrating over the distribution of opponent strategies (a Bayesian approach within machine learning) rather than taking a simple empirical average (the approach used in standard fictitious play) we derive a novel moderated fictitious play algorithm and show that it is more likely than standard fictitious play to converge to a payoff-domin...

SSRN Electronic Journal, 2019
In this paper we propose a new approach to studying the financial markets. Instead of the traditi... more In this paper we propose a new approach to studying the financial markets. Instead of the traditional topdown approach where a Brownian Motion is assumed as the driving force behind the market and where dynamic strategies are built as a result, we rather take the opposite point of view (the bottom-up approach) by assuming that it is the interaction of systematic strategies that induces the dynamics of the market. We achieve this shift in perspective, by reintroducing the High Frequency Trading Ecosystem (HFTE) model [86]. More specifically we specify an approach in which agents interact through a Neural Network structure designed to address the complexity demands of most common financial strategies but designed randomly at inception. This strategy ecosystem is then studied through a simplified genetic algorithm. Taking an approach in which simulation and hypothesis interact in order to improve the theory, we explore areas that are usually associated to fields orthogonal to Quantitative Finance such as Evolutionary Dynamics & predator-prey models. We introduce in that context concepts such as the Path of Interaction in order to study our Ecosystem of strategies through time. Finally a Particle Filter methodology is then proposed to track the market ecosystem through time.

Proceedings of the 24th International Conference on World Wide Web, 2015
Social media has led to the democratisation of opinion sharing. A wealth of information about pub... more Social media has led to the democratisation of opinion sharing. A wealth of information about public opinions, current events, and authors' insights into specific topics can be gained by understanding the text written by users. However, there is a wide variation in the language used by different authors in different contexts on the web. This diversity in language makes interpretation an extremely challenging task. Crowdsourcing presents an opportunity to interpret the sentiment, or topic, of free-text. However, the subjectivity and bias of human interpreters raise challenges in inferring the semantics expressed by the text. To overcome this problem, we present a novel Bayesian approach to language understanding that relies on aggregated crowdsourced judgements. Our model encodes the relationships between labels and text features in documents, such as tweets, web articles, and blog posts, accounting for the varying reliability of human labellers. It allows inference of annotations that scales to arbitrarily large pools of documents. Our evaluation using two challenging crowdsourcing datasets shows that by efficiently exploiting language models learnt from aggregated crowdsourced labels, we can provide up to 25% improved classifications when only a small portion, less than 4% of documents has been labelled. Compared to the six state-of-the-art methods, we reduce by up to 67% the number of crowd responses required to achieve comparable accuracy. Our method was a joint winner of the CrowdFlower-CrowdScale 2013 Shared Task challenge at the conference on Human Computation and Crowdsourcing (HCOMP 2013). Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author's site if the Material is used in electronic media.
Lecture Notes in Computer Science, 2008
This paper proposes a robust algorithm for adaptive modelling of EEG signal classification using ... more This paper proposes a robust algorithm for adaptive modelling of EEG signal classification using a modified Extended Kalman Filter (EKF). This modified EKF combines Radial Basis functions (RBF) and Autoregressive (AR) modeling and obtains better classification performance by truncating the filtering distribution when new observations are very informative.

Sensor Review, 1997
Arti cial neural networks (ANNs), or connectionist systems, are composed of simple non-linear pro... more Arti cial neural networks (ANNs), or connectionist systems, are composed of simple non-linear processing units (`neurons') connected into networks. They may be regarded, furthermore, as systems which adapt their functionality as a result of exposure to information (`training'). They have become enormously popular for data analysis over the past decade. Applications have included speech recognition, face recognition, DNA sequence labelling, time series prediction and medical data analysis to name but a few 13, 4, 9]. There are good reasons for this popularity and for hence utilising them, but there are also good reasons for not using them : ANNs o er no`magic' solution to problems and their internal`workings' are often obscure to the user. As such, then, they would appear to be, at best, an unreliable tool and, at worst, downright misleading. Why is it then that they have achieved such notoriety that even hardened statisticians (such as 11]) have become interested in their use? To answer this question we rst consider the two basic forms of data inference, classi cation and regression (prediction is regarded as a subset within the latter).
Neural Networks, 2011
This article appeared in a journal published by Elsevier. The attached copy is furnished to the a... more This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Latent force models (LFM) are principled approaches to incorporating solutions to differential eq... more Latent force models (LFM) are principled approaches to incorporating solutions to differential equations within non-parametric inference methods. Unfortunately, the development and application of LFMs can be inhibited by their computational cost, especially when closed-form solutions for the LFM are unavailable, as is the case in many real world problems where these latent forces exhibit periodic behaviour. Given this, we develop a new sparse representation of LFMs which considerably improves their computational efficiency, as well as broadening their applicability, in a principled way, to domains with periodic or near periodic latent forces. Our approach uses a linear basis model to approximate one generative model for each periodic force. We assume that the latent forces are generated from Gaussian process priors and develop a linear basis model which fully expresses these priors. We apply our approach to model the thermal dynamics of domestic buildings and show that it is effective at predicting day-ahead temperatures within the homes. We also apply our approach within queueing theory in which quasi-periodic arrival rates are modelled as latent forces. In both cases, we demonstrate that our approach can be implemented

We propose a novel sampling framework for inference in probabilistic models: an active learning a... more We propose a novel sampling framework for inference in probabilistic models: an active learning approach that converges more quickly (in wall-clock time) than Markov chain Monte Carlo (MCMC) benchmarks. The central challenge in probabilistic inference is numerical integration, to average over ensembles of models or unknown (hyper-)parameters (for example to compute the marginal likelihood or a partition function). MCMC has provided approaches to numerical integration that deliver state-of-the-art inference, but can suffer from sample inefficiency and poor convergence diagnostics. Bayesian quadrature techniques offer a model-based solution to such problems, but their uptake has been hindered by prohibitive computation costs. We introduce a warped model for probabilistic integrands (likelihoods) that are known to be non-negative, permitting a cheap active learning scheme to optimally select sample locations. Our algorithm is demonstrated to offer faster convergence (in seconds) relative to simple Monte Carlo and annealed importance sampling on both synthetic and real-world examples.
Lecture Notes in Computer Science, 2001
Uploads
Papers by Stephen Roberts