Papers by Claudio Gentile
This supplementary material contains all proofs and technical details omitted from the main text,... more This supplementary material contains all proofs and technical details omitted from the main text, along with ancillary comments, discussion about related work, and extra experimental results.

arXiv (Cornell University), Feb 11, 2015
Classical collaborative filtering, and content-based filtering methods try to learn a static reco... more Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithm takes into account the collaborative effects that arise due to the interaction of the users with the items, by dynamically grouping users based on the items under consideration and, at the same time, grouping items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on medium-size real-world datasets, showing scalability and increased prediction performance (as measured by click-through rate) over state-of-the-art methods for clustering bandits. We also provide a regret analysis within a standard linear stochastic noise setting.

2023 IEEE International Conference on Robotics and Automation (ICRA)
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target obj... more Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
arXiv (Cornell University), Aug 6, 2016
We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks tha... more We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving explore-exploit tradeoffs and collaborative steps. We prove regret bounds under various assumptions on the data, which exhibit a crisp dependence on the expected number of clusters over the users, a natural measure of the statistical difficulty of the learning task. Experiments on production and real-world datasets show that CAB offers significantly increased prediction performance against a representative pool of state-of-the-art methods.

arXiv (Cornell University), Jun 29, 2022
We study the problem of model selection in bandit scenarios in the presence of nested policy clas... more We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner's candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL [Agarwal et al., 2017] algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds, specifically in the case of nested adversarial linear bandits. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in (linear) bandit scenarios.

arXiv (Cornell University), Dec 6, 2021
We investigate the problem of active learning in the streaming setting in nonparametric regimes, ... more We investigate the problem of active learning in the streaming setting in nonparametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting.

ArXiv, 2021
The ability to train complex and highly effective models often requires an abundance of training ... more The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guaran...

Electronic Workshops in Computing, 2008
It is well known that Function Point Analysis suffers from several problems. In particular, the m... more It is well known that Function Point Analysis suffers from several problems. In particular, the measurement criteria and procedure are not defined precisely. Even the object of the measurement is not defined precisely: it is given by whatever set of documents and information representing the user requirements. As a consequence, measurement needs to be performed by an "expert", who can compensate the lack of precision of the method with the knowledge of common practices and interpretations. The paper aims at evaluating a methodology for function point measurement based on the representation of the system through UML models: this methodology aims at providing a precise definition of the object of the measurement, as well as the measurement procedure and rules. METHODS: An experimental application of the methodology is presented. A set of analysts (having different degrees of experience) were trained in the methodology and were then given the same requirements to model. The resulting models were measured by a few measurers, also trained in UML model-based counting. RESULTS: The results show that the variability of the FP measure is small compared to the one obtained after applying "plain" FPA, as described in the literature. More precisely, whereas the influence of the modeller on the result appears to be negligible (i.e., a counter gets the same results from different models of the same application), the variability due to the measurer is more significant (i.e., different counters get different results from the same model), but still small when compared to the results reported in the literature on FPA. CONCLUSIONS: The number of data points that we were able to collect was not big enough to allow reliable conclusions from a rigorous statistical viewpoint. Nevertheless, the results of the experiment tend to confirm that the considered technique decreases noticeably the variability of FP measures.
ArXiv, 2020
We present a new active learning algorithm that adaptively partitions the input space into a fini... more We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines.
A general framework for online learning with partial information is one where feedback graphs spe... more A general framework for online learning with partial information is one where feedback graphs specify which losses can be observed by the learner. We study a challenging scenario where feedback graphs vary stochastically with time and, more importantly, where graphs and losses are dependent. This scenario appears in several realworld applications that we describe where the outcome of actions are correlated. We devise a new algorithm for this setting that exploits the stochastic properties of the graphs and that benefits from favorable regret guarantees. We present a detailed theoretical analysis of this algorithm, and also report the results of a series of experiments on real-world datasets, which show that our algorithm outperforms standard baselines for online learning with feedback graphs.

Predicting the nodes of a given graph is a fascinating theoretical problem with applications in s... more Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, Shazoo, which is nearly optimal on any weighted tree. Moreover, we show that Shazoo can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that Shazoo performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods.

Predicting the nodes of a given graph is a fascinating theoretical problem with ap-plications in ... more Predicting the nodes of a given graph is a fascinating theoretical problem with ap-plications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, SHAZOO, which is nearly optimal on any weighted tree. Moreover, we show that SHAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that SHAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1

Multi-armed bandit problems are receiving a great deal of attention because they adequately forma... more Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a marked increase in prediction performance obtained by exploiting the network structure.

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately c... more We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over at most d consecutive steps in an adversarial way. This implies that the instantaneous loss observed by the player at the end of each round is a sum of as many as d loss components of previously played actions. Hence, unlike the standard bandit setting with delayed feedback, here the player cannot observe the individual delayed losses, but only their sum. Our main contribution is a general reduction transforming a standard bandit algorithm into one that can operate in this harder setting. We also show how the regret of the transformed algorithm can be bounded in terms of the regret of the original algorithm. Our reduction cannot be improved in general: we prove a lower bound on the regret of any bandit algorithm in this setting that matches (up to log factors) the upper bound obtained via our reduction. Finally, we show how our reduction can b...
New feature selection algorithms for linear threshold functions are described which combine backw... more New feature selection algorithms for linear threshold functions are described which combine backward elimination with an adaptive regularization method. This makes them particularly suitable to the classification of microarray expression data, where the goal is to obtain accurate rules depending on few genes only. Our algorithms are fast and easy to implement, since they center on an incremental (large margin) algorithm which allows us to avoid linear, quadratic or higher-order programming methods. We report on preliminary experiments with five known DNA microarray datasets. These experiments suggest that multiplicative large margin algorithms tend to outperform additive algorithms (such as SVM) on feature selection tasks. 1

Abstract. Motivated by a problem of targeted advertising in social net-works, we introduce and st... more Abstract. Motivated by a problem of targeted advertising in social net-works, we introduce and study a new model of online learning on labeled graphs where the graph is initially unknown, and the algorithm is free to choose the next vertex to predict. After observing that natural non-adaptive exploration/prediction strategies (like depth-first with majority vote) badly fail on simple binary labeled graphs, we introduce an adap-tive strategy that performs well under the hypothesis that the vertices of the unknown graph (i.e., the members of the social network) can be partitioned into a few well-separated clusters within which labels are roughly constant (i.e., members in the same cluster tend to prefer the same products). Our algorithm is efficiently implementable and provably competitive against the best of these partitions. Key words: online learning, graph prediction, unknown graph, cluster-ing. 1
A new algorithm for on-line learning linear-threshold functions is proposed which efficiently com... more A new algorithm for on-line learning linear-threshold functions is proposed which efficiently combines second-order statistics about the data with the ”logarithmic behavior ” of multiplicative/dual-norm algorithms. An initial theoretical analysis is provided suggesting that our algorithm might be viewed as a standard Perceptron algorithm operating on a transformed sequence of examples with improved mar-gin properties. We also report on experiments carried out on datasets from diverse domains, with the goal of comparing to known Perceptron algorithms (first-order, second-order, additive, multiplicative). Our learning procedure seems to general-ize quite well, and converges faster than the corresponding multiplicative baseline algorithms. 1 Introduction and
We describe a unifying method for proving relative loss bounds for online linear threshold classi... more We describe a unifying method for proving relative loss bounds for online linear threshold classification algorithms, such as the Perceptron and the Winnow algorithms. For classification problems the discrete loss is used, i.e., the total number of prediction mistakes. We introduce a continuous loss function, called the “linear hinge loss”, that can be employed to derive the updates of the algorithms. We first prove bounds w.r.t. the linear hinge loss and then convert them to the discrete loss. We introduce a notion of “average margin ” of a set of examples. We show how relative loss bounds based on the linear hinge loss can be converted to relative loss bounds i.t.o. the discrete loss using the average margin. 1
We present a new online learning algorithm in the selective sampling framework, where labels must... more We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Using a simple online-to-batch conversion technique, our selective sampling algorithm can be converted into a statistical (pool-based) active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. 1
Uploads
Papers by Claudio Gentile