Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015, arXiv (Cornell University)
…
10 pages
1 file
Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithm takes into account the collaborative effects that arise due to the interaction of the users with the items, by dynamically grouping users based on the items under consideration and, at the same time, grouping items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on medium-size real-world datasets, showing scalability and increased prediction performance (as measured by click-through rate) over state-of-the-art methods for clustering bandits. We also provide a regret analysis within a standard linear stochastic noise setting.
2016
We investigate an efficient context-dependent clustering technique for recommender systems based on exploration-exploitation strategies through multi-armed bandits over multiple users. Our algorithm dynamically groups users based on their observed behavioral similarity during a sequence of logged activities. In doing so, the algorithm reacts to the currently served user by shaping clusters around him/her but, at the same time, it explores the generation of clusters over users which are not currently engaged. We motivate the effectiveness of this clustering policy, and provide an extensive empirical analysis on real-world datasets, showing scalability and improved prediction performance over state-of-the-art methods for sequential clustering of users in multi-armed bandit scenarios.
2016
We investigate two context-dependent clustering techniques for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithms dynamically group users based on the items under consideration and, possibly, group items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on extensive real-world datasets, showing scalability and increased prediction performance over state-of-the-art methods for clustering bandits. For one of the two algorithms we also give a regret analysis within a standard linear stochastic noise setting.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014
With the prevalence of the Web and social media, users increasingly express their preferences online. In learning these preferences, recommender systems need to balance the trade-off between exploitation, by providing users with more of the "same", and exploration, by providing users with something "new" so as to expand the systems' knowledge. Multiarmed bandit (MAB) is a framework to balance this tradeoff. Most of the previous work in MAB either models a single bandit for the whole population, or one bandit for each user. We propose an algorithm to divide the population of users into multiple clusters, and to customize the bandits to each cluster. This clustering is dynamic, i.e., users can switch from one cluster to another, as their preferences change. We evaluate the proposed algorithm on two real-life datasets.
A big challenge for the design and implementation of large-scale online services is determining what items to recommend to their users. For instance, Netflix makes movie recommendations; Amazon makes product recommendations; and Yahoo! makes webpage recommendations. In these systems, items are recommended based on the characteristics and circumstances of the users, which are provided to the recommender as contexts (e.g., search history, time, and location). The task of building an efficient recommender system is challenging due to the fact that both the item space and the context space are very large. Existing works either focus on a large item space without contexts, large context space with small number of items, or they jointly consider the space of items and contexts together to solve the online recommendation problem. In contrast, we develop an algorithm that does exploration and exploitation in the context space and the item space separately, and develop an algorithm that combines clustering of the items with information aggregation in the context space. Basically, given a user's context, our algorithm aggregates its past history over a ball centered on the user's context, whose radius decreases at a rate that allows sufficiently accurate estimates of the payoffs such that the recommended payoffs converge to the true (unknown) payoffs. Theoretical results show that our algorithm can achieve a sublinear learning regret in time, namely the payoff difference of the oracle optimal benchmark, where the preferences of users on certain items in certain context are known, and our algorithm, where the information is incomplete. Numerical results show that our algorithm significantly outperforms (over 48%) the existing algorithms in terms of regret.
2017
We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving explore-exploit tradeoffs and collaborative steps. We prove regret bounds under various assumptions on the data, which exhibit a crisp dependence on the expected number of clusters over the users, a natural measure of the statistical difficulty of the learning task. Experiments on production and real-world datasets show that CAB offers significantly increased prediction performance against a representative pool of state-of-the-art methods.
IEEE Transactions on Knowledge and Data Engineering, 2019
Online interactive recommender systems strive to promptly suggest to consumers appropriate items (e.g., movies, news articles) according to the current context including both the consumer and item content information. However, such context information is o en unavailable in practice for the recommendation, where only the users' interaction data on items can be utilized. Moreover, the lack of interaction records, especially for new users and items, worsens the performance of recommendation further. To address these issues, collaborative ltering (CF), one of the recommendation techniques relying on the interaction data only, as well as the online multi-armed bandit mechanisms, capable of achieving the balance between exploitation and exploration, are adopted in the online interactive recommendation se ings, by assuming independent items (i.e., arms). Nonetheless, the assumption rarely holds in reality, since the real-world items tend to be correlated with each other (e.g., two articles with similar topics). In this paper, we study online interactive collaborative ltering problems by considering the dependencies among items. We explicitly formulate the item dependencies as the clusters on arms, where the arms within a single cluster share the similar latent topics. In light of the topic modeling techniques, we come up with a generative model to generate the items from their underlying topics. Furthermore, an e cient online algorithm based on particle learning is developed for inferring both latent parameters and states of our model. Additionally, our inferred model can be naturally integrated with existing multi-armed selection strategies in the online interactive collaborating se ing. Empirical studies on two real-world applications, online recommendations of movies and news, demonstrate both the e ectiveness and e ciency of the proposed approach.
International Journal of Open Information Technologies, 2021
The main goal of this paper is to introduce the reader to the multiarmed bandit algorithms of different types and to observe how the industry leveraged them in advancing recommendation systems. We present the current state of the art in RecSys and then explain what multiarmed bandits can bring to the table. First, we present the formalization of the multiarmed bandit problem and show the most common bandit algorithms such as upper confidence bound, Thompson Sampling, epsilon greedy, EXP3. Expanding on that knowledge, we review some important contextual bandits and present their benefits and usage in different applications. In this setting, context means side information about the users or the items of the problem. We also survey various techniques in multiarmed bandits that make bandits fit to the task of recommendations better; namely we consider bandits with multiple plays, multiple objective optimizations, clustering and collaborative filtering approaches. We also assess bandit backed recommendation systems implemented in the industry at Spotify, Netflix, Yahoo and others. At the same time we discuss methods of bandit evaluation and present an empirical evaluation of some notorious algorithms. We conduct short experiments on 2 datasets to show how different policies compare to each other and observe the importance of parameter tuning. This paper is a survey of the multi armed bandit algorithms and their applications to recommendation systems.
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
How to quickly and reliably learn the preferences of new users remains a key challenge in the design of recommender systems. In this paper we introduce a new type of online learning algorithm, cluster-based bandits, to address this challenge. This exploits the fact that users can often be grouped into clusters based on the similarity of their preferences, and this allows accelerated learning of new user preferences since the task becomes one of identifying which cluster a user belongs to and typically there are far fewer clusters than there are items to be rated. Clustering by itself is not enough however. Intra-cluster variability between users can be thought of as adding noise to user ratings. Deterministic methods such as decision-trees perform poorly in the presence of such noise. We identify so-called distinguisher items that are particularly informative for deciding which cluster a new user belongs to despite the rating noise. Using these items the cluster-based bandit algorithm is able to efficiently adapt to user responses and rapidly learn the correct cluster to assign to a new user.
Proceedings of the 17th ACM Conference on Recommender Systems
The paradigm of framing recommendations as (sequential) decisionmaking processes has gained significant interest. To achieve longterm user satisfaction, these interactive systems need to strike a balance between exploitation (recommending high-reward items) and exploration (exploring uncertain regions for potentially better items). Classical bandit algorithms like Upper-Confidence-Bound and Thompson Sampling, and their contextual extensions with linear payoffs have exhibited strong theoretical guarantees and empirical success in managing the exploration-exploitation tradeoff. Building efficient exploration-based systems for deep neural network powered real-world, large-scale industrial recommender systems remains under studied. In addition, these systems are often multi-stage, multi-objective and response time sensitive. In this talk, we share our experience in addressing these challenges in building exploration based industrial recommender systems. Specifically, we adopt the Neural Linear Bandit algorithm, which effectively combines the representation power of deep neural networks, with the simplicity of linear bandits to incorporate exploration in DNN based recommender systems. We introduce exploration capability to both the nomination and ranking stage of the industrial recommender system. In the context of the ranking stage, we delve into the extension of this algorithm to accommodate the multi-task setup, enabling exploration in systems with multiple objectives. Moving on to the nomination stage, we will address the development of efficient bandit algorithms tailored to factorized bi-linear models. These algorithms play a crucial role in facilitating maximum inner product search, which is commonly employed in large-scale retrieval systems. We validate our algorithms and present findings from real-world live experiments. CCS CONCEPTS • Information systems → Recommender systems.
We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Cornell University - arXiv, 2022
ACM Transactions on Interactive Intelligent Systems
Fifteenth ACM Conference on Recommender Systems, 2021
Proceedings of the 2014 SIAM International Conference on Data Mining, 2014
Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
ArXiv, 2021
Proceedings of the AAAI Conference on Artificial Intelligence
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
SSRN Electronic Journal
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Proceedings of the 1st international workshop on Utility-based data mining - UBDM '05, 2005