This repository implements and compares the performance of the following algorithms: 'UCLK-C' (our proposed algorithm), 'UCRL2-VTR (Bernstein-type)', 'TSDE', 'UCRL2', and 'RANDOM'.
- UCLK-C: introduced in "Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span"
- UCRL2-VTR: introduced in "Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation"
- TSDE: introduced in "Learning Unknown Markov Decision Processes: A Thompson Sampling Approach"
- UCRL2: introduced in "Near-optimal Regret Bounds for Reinforcement Learning"
- RANDOM: take arbitrary action
algorithms/: Algorithm implementationsenv/: Hard-To-Learn MDP environmentstest/: Create experiment outputsdata/: Logs and experiment outputsimage/: Regret plotsplot.ipynb: Regret visualization notebook
Install required packages:
pip install -r requirements.txtThe experiment supports the efficacy of our algorithm 'UCLK-C' in terms of learning hard-to-learn MDP.
This project is licensed under the MIT License. See LICENSE for details.
