Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010
Abstract Recent trends in robot learning are to use trajectory-based optimal control techniques and reinforcement learning to scale complex robotic systems. On the one hand, increased computational power and multiprocessing, and on the other hand, probabilistic reinforcement learning methods and function approximation, have contributed to a steadily increasing interest in robot learning. Imitation learning has helped significantly to start learning with reasonable initial behavior.
In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Three recent examples for the application of reinforcement learning to real-world robots are described: a pancake flipping task, a bipedal walking energy minimization task and an archery-based aiming task. In all examples, a state-of-the-art expectation-maximization-based reinforcement learning is used, and different policy representations are proposed and evaluated for each task. The proposed policy representations offer viable solutions to six rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, globality, multi-dimensionality and convergence. Both the successes and the practical difficulties encountered in these examples are discussed. Based on insights from these particular cases, conclusions are drawn about the state-of-the-art and the future perspective directions for reinforcement learning in robotics.
Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value function-based and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.
Robotics
Reinforcement Learning (RL) is gaining much research attention because it allows the system to learn from interacting with the environment. Yet, with all these successful applications, the application of RL in direct joint torque control without the help of an underlining dynamic model is not reported in the literature. This study presents a split network structure that enables successful training of RL to learn the direct torque control for trajectory following a six-axis articulated robot without prior knowledge of the dynamic robot model. The training took a very long time to converge. However, we were able to show the successful control of four different trajectories without needing an accurate dynamics model and complex inverse kinematics computation. To show the RL-based control’s effectiveness, we also compare the RL control with the Model Predictive Control (MPC), another popular trajectory control method. Our results show that while the MPC achieves smoother and more accura...
2002
Learning robot control, a subclass of the field of learning control, refers to the process of acquiring a sensory-motor control strategy for a particular movement task and movement system by trial and error. Learning control is usually distinguished from adaptive control (see ADAPTIVE CONTROL) in that the learning system is permitted to fail during the process of learning, while adaptive control emphasizes single trial convergence without failure.
IEEE Transactions on Robotics
Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.
Advanced Robotics, 1995
In this paper a learning method is described which enables a conventional industrial robot to accurately execute the teach-in path in presence of dynamical effects and high speed. After training the system is capable of generating positional commands that in combination with the standard robot controller lead the robot along the desired trajectory. The mean path deviations are reduced to a factor of 20 for our test configuration. For low speed motion the learned controllers' accuracy is in the range of the resolution of the positional encoders. The learned controller does not depend on specific trajectories. It acts as a general controller that can be used for non-recurring tasks as well as for sensor-based planned paths. For repetitive control tasks accuracy can be even increased. Such improvements are caused by a three level structure estimating a simple process model, optimal a posteriori commands, and a suitable feedforward controller, the latter including neural networks for the representation of nonlinear behaviour. The learning system is demonstrated in experiments with a Manutec R2 industrial robot. After training with only two sample trajectories the learned control system is applied to other totally different paths which are executed with high precision as well.
2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566)
This paper discusses a method to accelerate reinforcement learning. Firstly defined is a concept that reduces the state space conserving policy. An algorithm is then given that calculates the optimal cost-to-go and the optimal policy in the reduced space from those in the original space. Using the reduced state space, learning convergence is accelerated. Its usefulness for both DP (dynamic programing) iteration and Q-learning are compared through a maze example. The convergence of the optimal cost-to-go in the original state space needs approximately N or more times as long as that in the reduced state space, where N is a ratio of the state number of the original space to the reduced space. The acceleration effect for Q-learning is more remarkable than that for the DP iteration. The proposed technique is also applied to a robot manipulator working for a peg-in-hole task with geometric constraints. The state space reduction can be considered as a model of the change of observation, i.e., one of cognitive actions. The obtained results explain that the change of observation is reasonable in terms of learning efficiency.
Learning Techniques In Robotics, 2019
The field of machine learning is one that is gathering much interest . it has so many area of application of which robotics is one. This paper talks about the learning techniques used in robotics such as supervised learning, unsupervised learning, reinforcement learning and deep learning.
Robust manipulation with tractability in unstructured environments is a prominent hurdle in robotics. Learning algorithms to control robotic arms have introduced elegant solutions to the complexities faced in such systems. A novel method of Reinforcement Learning (RL), Gaussian Process Dynamic Programming (GPDP), yields promissing results for closed-loop control of a low-cost manipulator however research surrounding most RL techniques lack breadth of comparable experiments into the viability of particular learning techniques on equivalent environments. We introduce several model-based learning agents as mechanisms to control a noisy, low-cost robotic system. The agents were tested in a simulated domain for learning closed-loop policies of a simple task with no prior information. Then, the fidelity of the simulations is confirmed by application of GPDP to a physical system.
Revista Brasileira de Computação Aplicada, 2021
Since the establishment of robotics in industrial applications, industrial robot programming involves therepetitive and time-consuming process of manually specifying a fixed trajectory, which results in machineidle time in terms of production and the necessity of completely reprogramming the robot for different tasks.The increasing number of robotics applications in unstructured environments requires not only intelligent butalso reactive controllers, due to the unpredictability of the environment and safety measures respectively. This paper presents a comparative analysis of two classes of Reinforcement Learning algorithms, value iteration (Q-Learning/DQN) and policy iteration (REINFORCE), applied to the discretized task of positioning a robotic manipulator in an obstacle-filled simulated environment, with no previous knowledge of the obstacles’ positions or of the robot arm dynamics. The agent’s performance and algorithm convergence are analyzed under different reward functions and...
International Conference on Informatics in Control, Automation and Robotics, 2007
Research on robot techniques that are fast, user-friendly, and require little application-specific knowledge by the user, is more and more encouraged in a society where the demand of home-care or domestic-service robots is increasing continuously. In this context we propose a methodology which is able to achieve fast convergences towards good robot-control policies, and reduce the random explorations the robot needs to carry out in order to find the solutions. The performance of our approach is due to the mutual influence that three different elements exert on each other: reinforcement learning, genetic algorithms, and a dynamic representation of the environment around the robot. The performance of our proposal is shown through its application to solve two common tasks in mobile robotics.
Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a unified perspective which allows us to derive several policy learning algorithms from a common point of view, i.e, policy gradient algorithms, natural-gradient algorithms and EM-like policy learning. Secondly, we present several applications to both robot motor primitive learning as well as to robot control in task space. Results both from simulation and several different real robots are shown.
Journal of Dynamic Systems, Measurement, and Control, 1993
Learning control encompasses a class of control algorithms for programmable machines such as robots which attain, through an iterative process, the motor dexterity that enables the machine to execute complex tasks. In this paper we discuss the use of function identification and adaptive control algorithms in learning controllers for robot manipulators. In particular, we discuss the similarities and differences between betterment learning schemes, repetitive controllers and adaptive learning schemes based on integral transforms. The stability and convergence properties of adaptive learning algorithms based on integral transforms are highlighted and experimental results illustrating some of these properties are presented.
In real-world robotic applications, many factors, both at low-level (e.g., vision and motion control parameters) and at high-level (e.g., the behaviors) determine the quality of the robot performance. Thus, for many tasks, robots require fine tuning of the parameters, in the implementation of behaviors and basic control actions, as well as in strategic decisional processes. In recent years, machine learning techniques have been used to find optimal parameter sets for different behaviors. However, a drawback of learning techniques is time consumption: in practical applications, methods designed for physical robots must be effective with small amounts of data. In this paper, we present a method for concurrent learning of best strategy and optimal parameters, by extending the policy gradient reinforcement learning algorithm. The results of our experimental work in a simulated environment and on a real robot show a very high convergence rate.
Journal of Physical Agents (JoPha), 2012
This article describes a proposal to achieve fast robot learning from its interaction with the environment. Our proposal will be suitable for continuous learning procedures as it tries to limit the instability that appears every time the robot encounters a new situation it had not seen before. On the other hand, the user will not have to establish a degree of exploration (usual in reinforcement learning) and that would prevent continual learning procedures. Our proposal will use an ensemble of learners able to combine dynamic programming and reinforcement learning to predict when a robot will make a mistake. This information will be used to dynamically evolve a set of control policies that determine the robot actions.
International Journal of Machine Learning and Computing, 2015
Autonomous Robots, 2022
This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). In order to accelerate the learning process, the fine-tuning procedure is proposed that demonstrates the continuous adaptation of on-policy RL to new environments, allowing the learned policy to adapt and execute the (partially) modified task. A dense reward function is designed for the task to enable an efficient learning of the agent. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The learned control policy is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations. The approach is finally tested on a real Franka Emika Panda robot, showing the possibility to tran...
1992
We consider the problem of a robot manipulator operating in a noisy workspace. The manipulator is required to move from an initial position PI to a final position PI .
International Research Journal of Computer Science
Reinforcement learning (RL) is a subfield of machine learning which is being developed in Artificial Intelligence (AI). This technique is a data independent process. The primary aim of systems this kind is to maximize their reward signal which makes systems do better things trending to goal. Reinforcement Learning alters with techniques like supervised and unsupervised in such a way that in RL the agent gets up with its own insights and maps what action to perform in certain situations. On the other hand, Supervised and unsupervised have answers already embedded in them. In RL, in absence of new data, it can learn from its own experience where others can do. RL is used almost everywhere, the best applications of RL in Robotics specifically in motion control, planning it is also used in finance, gaming etc. Here is this paper demonstrating the navigation and motion control development of a 2 wheeled differential drive robot with the help of reinforcement learning topology. Traditionally, to design the behaviour of controllers in robots, we inevitably need models of how the robot actually behaves in the environment. But here we come up with a RL approach to design the control structure for the robot to navigate in the indoor environment.