Bandit Algorithms in Hyperparameter Tuning
What is the Multi-Armed Bandit Problem?
---------------------------------------
A decision-making framework where a gambler must choose among multiple slot machines ("arms"),
each with an unknown probability of reward. The goal is to maximize the total reward over time by
balancing:
- Exploration: Trying different arms to learn their rewards.
- Exploitation: Choosing the best-known arm to maximize gain.
Bandit Algorithms in ML Tuning
-------------------------------
In machine learning, each "arm" is a hyperparameter configuration, and the reward is the
performance (e.g., accuracy, loss). Bandit-based methods help find good configurations efficiently.
Examples:
- Hyperband: Combines bandit principles with early stopping.
- Successive Halving: Evaluates many configurations with few resources, drops poor performers
early.
- Bayesian Optimization + Bandits: Merges probabilistic models with exploration-exploitation
balance.
Used in:
- Ray Tune
- Optuna
- Ax