The SelfSparring algorithm simplifies the multi-dueling bandits problem to a conventional bandit setting using Thompson Sampling, while modeling dependencies with a Gaussian process prior.
This repository provides an implementation of the SelfSparring algorithm proposed in the following paper:
Multi-dueling bandits with dependent arms
Yanan Sui, Vincent Zhuang, Joel Burdick, Yisong Yue
Conference on Uncertainty in Artificial Intelligence (UAI), 2017
PDF
Python = 3.13
pip install -r requirements.txt && pip install -e .To get started, it is recommended to run the interactive notebooks in the examples folder. We provide 1-d and 2-d examples for the algorithm.
An example for 1-d:
import torch
from selfsparring import KernelSelfSparring
# Initialization
candidates = torch.linspace(0, 10, 100).unsqueeze(-1)
bounds = torch.tensor([[0.], [10.]])
datapoints = candidates[[40, 60]]
value, comparisons = func_with_compare(datapoints)
fmin = torch.tensor([0.2])
lipschitz = torch.tensor([0.5])
algo = KernelSelfSparring(
candidates=candidates,
bounds=bounds,
datapoints=datapoints,
comparisons=comparisons,
num_samples=2,
covar_module=None,
ci_beta=2.0,
enable_fit=False,
)
# Optimization
for _ in range(20):
next_point = algo.next()
if next_point is None: break
next_value, next_comparison = func_with_compare(next_point)
algo.update(next_point, next_comparison)For how to construct comparisons, please refer to Pairwise GP Models.
@article{DBLP:journals/corr/SuiZBY17,
author = {Yanan Sui and
Vincent Zhuang and
Joel W. Burdick and
Yisong Yue},
title = {Multi-dueling Bandits with Dependent Arms},
journal = {CoRR},
year = {2017}
}