Policy evaluation in the infinite-horizon discounted reward setting in RL. This is the code associated with the following publication https://arxiv.org/abs/2002.06299
Dai, Falcon Z and Walter, Matthew R. Loop Estimator for Discounted Values in Markov Reward Processes. Proceedings of AAAI. 2021.
demo.ipynbis the python notebook of experiments including the plots in the main paper.estimate.pycontains the estimators for state values, namelyco_loopfor the loop estimator,co_td_kfor TD(k) estimator,co_model_basedfor the model-based estimator. See their definitions in the paper. Their implementations extensively exploit co-routines, i.e.,yieldstatements, to enhance both readability and efficiency.mrp.pycontains the definition of Markov reward processes and, in particular, the definition of RiverSwim.mc.pycontains some utility functions for Markov chains.*.npyare pre-computed state value estimates from the different estimators (used in generating the plots in the main paper).
python 3.xjupyter. Install bypip3 install jupyter
To replicate the experimental results in the paper:
- Start the jupyter notebook server at the project root
jupyter notebook- Select the notebook
demo.ipynb - Follow the comments within. Optionally load the pre-computed estimates instead of re-computing them.
Please cite our work if you find this repo or the associated paper useful.
@inproceedings{dai-walter-2021-loop,
title = "Loop Estimator for Discounted Values in Markov Reward Processes",
author = "Dai, Falcon Z and Walter, Matthew R",
booktitle = "Proceedings of Association for the Advancement of Artificial Intelligence Conference",
month = feb,
year = "2021",
publisher = "Association for the Advancement of Artificial Intelligence"
}Falcon Dai ([email protected])