A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

Official Repository for “A Joint Imitation-Reinforcement Learning (JIRL) Framework for Reduced Baseline Regret”

Technical Report

The report contains a detailed description of the experimental settings and hyperparameters used to obtain the results reported in our paper.

Leveraging a baseline’s online demonstrations to minimize the regret w.r.t the baseline policy during training
Eventually surpassing the baseline performance

Inverted pendulum

Lunar lander

Walker-2D

Lane following (CARLA)

Lane following (JetRacer)

Inverted pendulum

Lunar lander

Lane following (CARLA)

Walker-2D

Lane following (JetRacer)

Lunar lander (JIRL vs TRPO)