Policy gradient learning for quadruped soccer robots

D. Nardi

Policy gradient learning for quadruped soccer robots

D. Nardi

2010, Robotics and Autonomous Systems

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In real-world robotic applications, many factors, both at low-level (e.g., vision, motion control and behaviors) and at high-level (e.g., plans and strategies) determine the quality of the robot performance. Consequently, fine tuning of the parameters, in the implementation of the basic functionalities, as well as in the strategic decisions, is a key issue in robot software development. In recent years, machine learning techniques have been successfully used to find optimal parameters for typical robotic functionalities. However, one major drawback of learning techniques is time consumption: in practical applications, methods designed for physical robots must be effective with small amounts of data. In this paper, we present a method for concurrent learning of best strategy and optimal parameters using policy gradient reinforcement learning algorithm. The results of our experimental work in a simulated environment and on a real robot show a very high convergence rate.

Andrea Cherubini

In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at highlevel (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning.

Log In

Policy gradient learning for quadruped soccer robots

Sign up for access to the world's latest research

Abstract

Related papers

Related topics

Related papers