Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, Robotics and Autonomous Systems
In real-world robotic applications, many factors, both at low-level (e.g., vision, motion control and behaviors) and at high-level (e.g., plans and strategies) determine the quality of the robot performance. Consequently, fine tuning of the parameters, in the implementation of the basic functionalities, as well as in the strategic decisions, is a key issue in robot software development. In recent years, machine learning techniques have been successfully used to find optimal parameters for typical robotic functionalities. However, one major drawback of learning techniques is time consumption: in practical applications, methods designed for physical robots must be effective with small amounts of data. In this paper, we present a method for concurrent learning of best strategy and optimal parameters using policy gradient reinforcement learning algorithm. The results of our experimental work in a simulated environment and on a real robot show a very high convergence rate.
In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at highlevel (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning.
A central problem in autonomous robotics is how to design programs that determine what the robot should do next. Behaviour-based control is a popular paradigm, but current approaches to behaviour design typically involve hand-coded behaviours. The aim of this work is to explore the use of reinforcement learning to develop autonomous robot behaviours automatically, and specifically to look at the performance of the resulting behaviours. This thesis examines the question of whether behaviours for a real behaviour-based, autonomous robot can be learnt under simulation using the Monte Carlo Exploring Starts, -soft On Policy Monte Carlo or linear, gradient-descent Sarsa algorithms. A further question is whether the increased performance of learnt behaviours carries through to increased performance on the real robot. In addition, this work looks at whether continuing to learn on the real robot causes further improvement in the performance of the behaviour. A novel method is developed, termed Policy Initialisation, that makes use of the domain knowledge in an existing, hand-coded behaviour by converting the behaviour into either a reinforcement learning policy or an action-value function. This is then used to bootstrap the learning process. The Markov Decision Process model is central to reinforcement learning algorithms. This work examines whether it is possible to use an internal world model in the real robot to suit the requirements of the Markov Decision Process model. The methodology used to answer these questions is to take three realistic, non-trivial robotic tasks, and attempt to learn behaviours for each. The learnt behaviours are then compared with hand-coded behaviours that have either been published or used in international competition. The tasks are based on real task requirements for robots used in a RoboCup Formula 2000 robot soccer team. The first is a generic movement behaviour that moves the robot to a target point. The second requires the robot to dribble the ball in an arc so that the robot maintains possession and so that the final position is lined up with the goal. The third addresses the problem of kicking the ball away from the wall. The results show that for these three different types of behavioural problem, reinforcement learning on a simulator produced significantly better performance than hand-coded equivalents, not only under simulation but also on the real robot. In contrast to this, continuing the learning process on the real robot did not significantly improve performance. The Policy Initialisation technique is found to accelerate learning for tabular Monte Carlo methods, but makes minimal improvement and is, in fact, costly to use in conjunction with linear, gradient-descent Sarsa. This approach, unlike some other techniques for accelerating learning, does not appear to bias the solution. Finally, the evidence from this thesis is that internal world models that maintain the requirements of Markov Decision Processes can be constructed, and this appears to be a sound approach to avoiding problems connected with partial observability that have previously occurred in the use of reinforcement learning in robotic environments.
Mobile robots can benefit from machine learning approaches for improving their behaviors in performing complex activities. In recent years, these techniques have been used to find optimal parameter sets for many behaviors. In particular, layered learning has been proposed to improve learning rate in robot learning tasks. In this paper we consider a layered learning approach for learning optimal parameters of basic control routines, behaviours and strategy selection. We compare three different methods in the different layers: genetic algorithm, Nelder-Mead, and policy gradient. Moreover, we study how to use a 3D simulator for speeding up robot learning. The results of our experimental work on AIBO robots are useful not only to state differences and similarities between different robot learning approaches used within the layered learning framework, but also to evaluate a more effective learning methodology that makes use of a simulator.
The RoboCup Soccer Domain, which was proposed in order to provide a new long-term challenge for Artificial Intelligence research, constitutes real experimentation and testing activities for the development of intelligent, autonomous robots. At the Centro Universitário da FEI we are developing a project to compete at the Robocup Simulation league, aiming to test Reinforcement Learning algorithms in a Multiagent domain. This text describes the team developed for the 2006 Robot Soccer Simulation competition, to be held in Campo Grande, MS, Brasil. We conclude that Reinforcement Learning algorithms performs well in this domain.
arXiv (Cornell University), 2020
This article introduces an open framework, called VSSS-RL, for studying Reinforcement Learning (RL) and sim-to-real in robot soccer, focusing on the IEEE Very Small Size Soccer (VSSS) league. We propose a simulated environment in which continuous or discrete control policies can be trained to control the complete behavior of soccer agents and a sim-to-real method based on domain adaptation to adapt the obtained policies to real robots. Our results show that the trained policies learned a broad repertoire of behaviors that are difficult to implement with handcrafted control policies. With VSSS-RL, we were able to beat human-designed policies in the 2019 Latin American Robotics Competition (LARC), achieving 4th place out of 21 teams, being the first to apply Reinforcement Learning (RL) successfully in this competition. Both environment and hardware specifications are available open-source to allow reproducibility of our results and further studies.
Neri, J.R.F.;Zatelli, M.R.;Fabro, J.A.; Santos, C.H.F. 2012, "A Proposal of QLearning to Control the Atack of a 2D Robot Soccer Simulation Team", Brazilian Robotics Symposium and Latin American Robotics Symposium
This document presents a novel approach to control the attack behavior of a team of simulated soccer playing robot of the Robocup 2D category. The presented approach modifies the behavior of each player only when in the state "controlling the ball". The approach is based on a modified Q-Learning algorithm that implements a continuous machine learning process. After an initial learning phase, each player uses its previous experience during the simulation of the game to decide if it should dribble, pass or kick the ball to the adversary goal. Simulation results show that the proposed approach is capable of learn how to score goals, even when faced with the -champion of the previous world championship.
The RoboCup Soccer Domain, which was proposed in order to provide a new long-term challenge for Artificial Intelligence research, constitutes real experimentation and testing activities for the development of intelligent, autonomous robots. At the Centro Universitário da FEI we are developing a project to compete at the Robocup Simulation league, aiming to test Reinforcement Learning algorithms in a Multiagent domain. This text describes the team developed for the Robot Soccer Simulation competition. We conclude that Reinforcement Learning algorithms performs well in this domain.
2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), 2019
In the current level of evolution of Soccer 3D, motion control is a key factor in team's performance. Recent works takes advantages of model-free approaches based on Machine Learning to exploit robot dynamics in order to obtain faster locomotion skills, achieving running policies and, therefore, opening a new research direction in the Soccer 3D environment. In this work, we present a methodology based on Deep Reinforcement Learning that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot's dynamics. Our results outperformed the previous state-of-the-art sprint velocity reported in Soccer 3D literature by a significant margin. It also demonstrated improvement in sample efficiency, being able to learn how to run in just few hours. We reported our results analyzing the training procedure and also evaluating the policies in terms of speed, reliability and human similarity. Finally, we presented key factors that lead us to improve previous results and shared some ideas for future work.
2004
Abstract This paper presents a machine learning approach to optimizing a quadrupedal trot gait for forward speed. Given a parameterized walk designed for a specific robot, we propose using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk. We implement and test our approach on a commercially available quadrupedal robot platform, namely the Sony Aibo robot.
Pagello, E.(Hrsg.); Menegatti, E.( …, 2007
This paper presents an approach for learning complex tasks on real robots, like walking or kicking in a humanoid soccer robot, profiting at most from the possibility to run simulations of a virtual model of the robot. This approach avoids to damage the real robot in the time consuming trials needed to learn a correct behavior and avoids to overfit the virtual robot model. The basic idea is to run most of the learning steps in simulation and to use a few learning steps on the real robot to assess discrepancies between the simulation and the reality. The calculated discrepancies are used to correct the fitness function used in simulation. Experiments on interleaving the learning between a real robot (Robovie-M by VStone) and its virtual model in USARSim are presented. They show that the proposed method is effective and significantly reduces learning time.
The robotic soccer belongs to the class of multi-agent systems and involves many challenging sub-problems. Teams of robotic players have to cooperate in order to put the ball in the opposing goal and at the same time defend their own goal. The paper is concerned with the problem of learning and implementation of reactive behaviors for robotic agents playing soccer. It briefly presents the whole control system designed for the Cerberus'01 Sony legged robot team that participated in RoboCup 2001 competitions in Seattle, USA, and then introduces the developed reactive behavior for interception of the moving ball while avoiding collisions with other robotic players and play field walls. For the implementation of the above behavior a fuzzy-neural trajectory generator (FNTG) has been developed and learned. Genetic Algorithms (GAs)based approach has been employed to perform the learning process of the proposed FNTG.
2014 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), 2014
Robotic soccer provides a rich environment for the development of Reinforcement Learning controllers. The competitive environment imposes strong requirements on performance of the developed controllers. RL offers a valuable alternative for the development of efficient controllers while avoiding the hassle of parameter tuning a hand coded policy. This paper presents the application of a recently proposed Batch RL updaterule to learn robotic soccer controllers in the context of the RoboCup Middle Size League. The Q-Batch update-rule exploits the episodic structure of the data collection phase of Batch RL to efficiently evaluate and improve the learned policy. Three different learning tasks, with increasing difficulty, were developed and applied on a simulated environment and later on the physical robot. The performance of the learned controllers is mostly compared to hand-tuned controllers while some comparisons with other RL methods were performed. Results show that the proposed approach is able to learn the tasks in a reduced amount of time, even outperforming existing hand-coded solutions.
Proc. of the 1st International …, 2001
Journal of Intelligent & Robotic Systems, 2015
Reinforcement Learning is increasingly becoming a valuable alternative to tackle many of the challenges existing in a semi-structured, nondeterministic and adversarial environment such as robotic soccer. Batch Reinforcement Learning is a class of Reinforcement Learning methods characterized by processing a batch of interactions. By storing all past interactions, Batch RL methods are extremely data-efficient which makes this class of methods very appealing for robotics applications, specially when learning directly on physical robotic platforms.This paper presents the application of Batch Reinforcement Learning to obtain efficient robotic soccer controllers on physical platforms. To learn the controllers we propose the application of Q-Batch, a novel update-rule that exploits the episodic nature of the interactions in Batch Reinforcement Learning. The approach was validated in three different tasks with increasing difficulty. Results show the proposed approach is able to outperform hand-coded policies, for all the tasks, in a reduced amount of time.
International Journal of Social Robotics, 2020
Nowadays, humanoid soccer serves as a benchmark for artificial intelligence and robotic problems. The factors such as the kicking speed and the number of kicks by robot soccer players are the most significant aims that the participating teams are pursued in the RoboCup 3D Soccer Simulation League. The proposed method presents a kicking strategy during walking for humanoid soccer robots. Achieving an accurate and powerful kicking while robots are moving requires a dynamic optimization of the speed and motion parameters of the robot. In this paper, a curved motion path has been designed based on the robot position relative to the ball and the goal. Ultimately, the robot will be able to kick at the goal by walking along this curve path. The speed and angle of the walking robot are set towards the ball with regard to the robots curved motion path. After the final step of the robot, the accurate and effective adjustment of these two parameters ensures that the robot is located in the ideal position to perform the perfect kick. Due to the noise and walking condition of the robot, it is essential that the speed and angle of motion to be measured more accurately. For this purpose, we use a reinforcement learning model to adjust the robots step size and so does achieve the optimal value of two abovementioned parameters. Using reinforcement learning, robot would learn to pursue an optimal policy to correctly kick towards designated points. Therefore, the proposed method is a model-free and based on dynamic programming. The experiments reveal that the proposed method has significantly improved the team overall performance and robots ability to kick. Our proposed method has been 9.32% successful on average and outperformed the UTAustinVilla agent in terms of goal-scoring time in a non-opponent simulator.
In real-world robotic applications, many factors, both at low-level (e.g., vision and motion control parameters) and at high-level (e.g., the behaviors) determine the quality of the robot performance. Thus, for many tasks, robots require fine tuning of the parameters, in the implementation of behaviors and basic control actions, as well as in strategic decisional processes. In recent years, machine learning techniques have been used to find optimal parameter sets for different behaviors. However, a drawback of learning techniques is time consumption: in practical applications, methods designed for physical robots must be effective with small amounts of data. In this paper, we present a method for concurrent learning of best strategy and optimal parameters, by extending the policy gradient reinforcement learning algorithm. The results of our experimental work in a simulated environment and on a real robot show a very high convergence rate.
LatinX in AI at Neural Information Processing Systems Conference 2019
This work presents an application of Reinforcement Learning (RL) for the complete control of real soccer robots of the IEEE Very Small Size Soccer (VSSS), a traditional league in the Latin American Robotics Competition (LARC). In the VSSS league, two teams of three small robots play against each other. We propose a simulated environment in which continuous or discrete control policies can be trained, and a Sim-to-Real method to allow using the obtained policies to control a robot in the real world. The results show that the learned policies display a broad repertoire of behaviors that are difficult to specify by hand. This approach, called VSSS-RL, was able to beat the human-designed policy for the striker of the team ranked 3rd place in the 2018 LARC, in 1-vs-1 matches.
2nd International Conference on …, 2004
The robotic soccer is one of the complex multi-agent systems in which agents play the role of soccer players. The characteristics of such systems are: real-time, noisy, collaborative and adversarial. Because of the inherent complexity of this type of systems, machine learning is used for training agents. Since the main purpose of a soccer game is to score goals, it is important for a robotic soccer agent to have a clear policy about whether s/he should attempt to score in a given situation. Many parameters affect the result of shooting toward the goal. UvA Trilearn simulation team considers two more important parameters for this behavior. This paper describes the optimizing policy which is used in the UvA team, by choosing two additional important parameters as well as using reinforcement learning method.
2006
In this paper, we apply Reinforcement Learning (RL) to a real-world task. While complex problems have been solved by RL in simulated worlds, the costs of obtaining enough training examples often prohibits the use of plain RL in real-world scenarios. We propose three approaches to reduce training expenses for real-world RL. Firstly, we replace the random exploration of the huge search space, which plain RL uses, by guided exploration that imitates a teacher. Secondly, we use experiences not only once but store and reuse them later on when their value is easier to assess. Finally, we utilize function approximators in order to represent the experience in a way that balances between generalization and discrimination. We evaluate the performance of the combined extensions of plain RL using a humanoid robot in the RoboCup soccer domain. As we show in simulation and real-world experiments, our approach enables the robot to quickly learn fundamental soccer skills.
Proceedings of the 13th International Conference on Agents and Artificial Intelligence
In this paper, we present an active vision method using a deep reinforcement learning approach for a humanoid soccer-playing robot. The proposed method adaptively optimises the viewpoint of the robot to acquire the most useful landmarks for self-localisation while keeping the ball into its viewpoint. Active vision is critical for humanoid decision-maker robots with a limited field of view. To deal with active vision problem, several probabilistic entropy-based approaches have previously been proposed which are highly dependent on the accuracy of the self-localisation model. However, in this research, we formulate the problem as an episodic reinforcement learning problem and employ a Deep Q-learning method to solve it. The proposed network only requires the raw images of the camera to move the robot's head toward the best viewpoint. The model shows a very competitive rate of 80% success rate in achieving the best viewpoint. We implemented the proposed method on a humanoid robot simulated in Webots simulator. Our evaluations and experimental show that the proposed method outperforms the entropy-based methods in the RoboCup context, in cases with high self-localisation errors.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.