A reinforcement-learning project that trains and evaluates a tabular Q-learning agent on Gymnasium’s FrozenLake-v1 environment.
- Tabular Q-learning with epsilon-greedy exploration
- Reproducible environment setup (seeded random map or fixed map)
- Saves run artifacts:
qtable.npy,config.json,history.json,metrics.jsonlearning_curve.png,policy_heatmap.png
- CLI commands:
train,eval,plot
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtOptional dev tools (for local lint/tests):
pip install pytest ruffpython scripts/train.py --episodes 500 --max-steps 100 --map-size 4 --map-seed 10This creates a new run folder under:
results/runs/<timestamp>_seed<seed>_ms<map_size>_slip<0|1>/
Evaluate a saved run directory:
python scripts/eval.py --run-dir results/runs/<your_run_folder> --episodes 100Or evaluate a standalone Q-table file (using a newly-created env config):
python scripts/eval.py --qtable results/runs/<your_run_folder>/qtable.npy --episodes 100 --map-size 4 --map-seed 10python scripts/plot.py --run-dir results/runs/<your_run_folder>--map-size(default: 4)--map-seed(default: 10) for reproducible random map generation--slipperyenables stochastic transitions (default is non-slippery)
--lr(learning rate, default: 0.8)--gamma(discount factor, default: 0.95)--epsilon-start(default: 1.0)--epsilon-min(default: 0.01)--epsilon-decay(default: 0.995)
Each training run saves:
qtable.npy: learned Q-table (NumPy array shaped[n_states, n_actions])config.json: environment + training configuration (includes the map/desc used)history.json: per-episode rewards/steps/epsilonsmetrics.json: evaluation summary (success rate, averages, per-episode stats)learning_curve.png: reward and steps per episodepolicy_heatmap.png: max Q-values heatmap with best-action arrows
MIT (see LICENSE).


