Academia.eduAcademia.edu

Learning in real robots from environment interaction

2012, Journal of Physical Agents (JoPha)

Abstract

This article describes a proposal to achieve fast robot learning from its interaction with the environment. Our proposal will be suitable for continuous learning procedures as it tries to limit the instability that appears every time the robot encounters a new situation it had not seen before. On the other hand, the user will not have to establish a degree of exploration (usual in reinforcement learning) and that would prevent continual learning procedures. Our proposal will use an ensemble of learners able to combine dynamic programming and reinforcement learning to predict when a robot will make a mistake. This information will be used to dynamically evolve a set of control policies that determine the robot actions.

Key takeaways

  • Due to this, instead of building a learning system that needs to determine the suitable action for every state of the robot, we prefer to build an ensemble of parallel learners able to determine, each one of them, the interval of actions most suitable for each state of the robot [3], [4], Figure 1.
  • In our case we need to use unsupervised techniques able to quantify the sensor space in a set of regions according to how similar the values coming from the sensors are, the best action for every one of these states will have to be discovered by the robot.
  • Therefore, in our case, a control policy π is a function that determines for every possible state of the robot, the interval of actions that seems to be suitable for the task.
  • The robot moves a bit backwards every time it makes a mistake and receives negative reinforcement, this can be appreciated in the robots trajectory.
  • In this paper we have described a system that moves us close to continuous reinforcement learning procedures in a real robot operating in real environments.