Skip to main content

Gabriel A D Lopes

Followers

3

Following

2

Co-author

1

Public Views

Laura Cannavacciuolo

University of Southern California

Mohsen Amidzadeh

Souradip Chakraborty

Northeastern University

Interests

Uploads

Papers by Gabriel A D Lopes

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS 1 A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Abstract—Policy gradient based actor-critic algorithms are amongst the most popular algorithms in... more Abstract—Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several ...

Approximate analytical solutions to the double-stance dynamics of the lossy spring-loaded inverted pendulum

Bioinspiration & biomimetics, Dec 5, 2016

This paper introduces approximate time-domain solutions to the otherwise non-integrable double-st... more This paper introduces approximate time-domain solutions to the otherwise non-integrable double-stance dynamics of the 'bipedal' spring-loaded inverted pendulum (B-SLIP) in the presence of non-negligible damping. We first introduce an auxiliary system whose behavior under certain conditions is approximately equivalent to the B-SLIP in double-stance. Then, we derive approximate solutions to the dynamics of the new system following two different methods: (i) updated-momentum approach that can deal with both the lossy and lossless B-SLIP models, and (ii) perturbation-based approach following which we only derive a solution to the lossless case. The prediction performance of each method is characterized via a comprehensive numerical analysis. The derived representations are computationally very efficient compared to numerical integrations, and, hence, are suitable for online planning, increasing the autonomy of walking robots. Two application examples of walking gait control are ...

A fast sampling method for estimating the domain of attraction

Nonlinear Dynamics, 2016

Most stabilizing controllers designed for nonlinear systems are valid only within a specific regi... more Most stabilizing controllers designed for nonlinear systems are valid only within a specific region of the state space, called the domain of attraction (DoA). Computation of the DoA is usually costly and time-consuming. This paper proposes a computationally effective sampling approach to estimate the DoAs of nonlinear systems in real time. This method is validated to approximate the DoAs of stable equilibria in several nonlinear systems. In addition, it is implemented for the passivity-based learning controller designed for a second-order dynamical system. Simulation and experimental results show that, in all cases studied, the proposed sampling technique quickly estimates the DoAs, corroborating its suitability for realtime applications.

Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning

Automatica, 2016

This paper considers optimal output synchronization of heterogeneous linear multi-agent systems. ... more This paper considers optimal output synchronization of heterogeneous linear multi-agent systems. Standard approaches to output synchronization of heterogeneous systems require either the solution of the output regulator equations or the incorporation of a p-copy of the leader's dynamics in the controller of each agent. By contrast, in this paper neither one is needed. Moreover, here both the leader's and the follower's dynamics are assumed to be unknown. First, a distributed adaptive observer is designed to estimate the leader's state for each agent. The output synchronization problem is then formulated as an optimal control problem and a novel model-free off-policy reinforcement learning algorithm is developed to solve the optimal output synchronization problem online in real time. It is shown that this optimal distributed approach implicitly solves the output regulation equations without actually doing so. Simulation results are provided to verify the effectiveness of the proposed approach

Three-Dimensional Reconstruction of the Human Motion Based on Images from a Single Camera

Isbs Conference Proceedings Archive, 1998

On the synchronization of cyclic discrete-event systems

2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012

Balancing a Legged Robot Using State-Dependent Riccati Equation Control

IFAC Proceedings Volumes, 2014

In this paper, we propose a nonlinear control approach for balancing underactuated legged robots.... more In this paper, we propose a nonlinear control approach for balancing underactuated legged robots. For the balancing task, the robot is modeled as a generalized version of a Segway. The control design is based on the State-Dependent Riccati Equation (SDRE) approach. The domain of attraction of the SDRE controller is compared to the domain of attraction of a linear quadratic controller. Using a simulation example of a four-legged robot balancing on its hind legs, we show that the SDRE controller gives a reasonably large domain of attraction, even with realistic level constraints on the control input, while the linear quadratic controller is unable to stabilize the system.

Learning Sequential Composition Control

IEEE transactions on cybernetics, Jan 14, 2015

Sequential composition is an effective supervisory control method for addressing control problems... more Sequential composition is an effective supervisory control method for addressing control problems in nonlinear dynamical systems. It executes a set of controllers sequentially to achieve a control specification that cannot be realized by a single controller. As these controllers are designed offline, sequential composition cannot address unmodeled situations that might occur during runtime. This paper proposes a learning approach to augment the standard sequential composition framework by using online learning to handle unforeseen situations. New controllers are acquired via learning and added to the existing supervisory control structure. In the proposed setting, learning experiments are restricted to take place within the domain of attraction (DOA) of the existing controllers. This guarantees that the learning process is safe (i.e., the closed loop system is always stable). In addition, the DOA of the new learned controller is approximated after each learning trial. This keeps the...

On the eigenstructure of a class of max-plus linear systems

IEEE Conference on Decision and Control and European Control Conference, 2011

Switching Max-Plus models for legged locomotion

2009 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2009

We present a new class of gait generation and control algorithms based on the Switching Max-Plus ... more We present a new class of gait generation and control algorithms based on the Switching Max-Plus modeling framework that allows for the synchronization of multiple legs of walking robots. Transitions between stance and swing phases of each leg are modeled as discrete events on a system described by max-plus-linear state equations. Different gaits and gait parameters can be interleaved by using different system matrices. Switching in max-plus-linear systems offers a powerful collection of modeling, analysis, and control tools that, in particular, allow for safe transitions between different locomotion gaits that may involve breaking/enforcing synchronization or changing the order of leg lift off events. Experimental validation of the proposed algorithms is presented by the implementation of various horse gaits on a simple quadruped robot.

Visual registration and navigation using planar features

2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)

Synchronization of a class of cyclic discrete-event systems describing legged locomotion

Discrete Event Dynamic Systems, 2015

It has been shown that max-plus linear systems are well suited for applications in synchronizatio... more It has been shown that max-plus linear systems are well suited for applications in synchronization and scheduling, such as the generation of train timetables, manufacturing, or traffic. In this paper we show that the same is true for multi-legged locomotion. In this framework, the max-plus eigenvalue of the system matrix represents the total cycle time, whereas the max-plus eigenvector dictates the steady-state behavior. Uniqueness of the eigenstructure also indicates uniqueness of the resulting behavior. For the particular case of legged locomotion, the movement of each leg is abstracted to two-state circuits: swing and stance (leg in flight and on the ground, respectively). The generation of a gait (a manner of walking) for a multiple legged robot is then achieved by synchronizing the multiple discrete-event cycles via the max-plus framework. By construction, different gaits and gait parameters can be safely interleaved by using different system matrices. In this paper we address both the transient and steady-state behavior for a class of gaits by presenting closed-form expressions for the max-plus eigenvalue and max-plus eigenvector of the system matrix and the coupling time. The significance of this result is in showing guaranteed robustness to perturbations and gait switching, and also a systematic methodology for synthesizing controllers that allow for legged robots to change rhythms fast.

A modeling framework for model predictive scheduling using switching max-plus linear models

52nd IEEE Conference on Decision and Control, 2013

In this paper we discuss a modeling framework for model predictive scheduling of a class of semi-... more In this paper we discuss a modeling framework for model predictive scheduling of a class of semi-cyclic discrete event systems that can be described by switching max-plus linear models. We study the structure of the system matrices and derive how routing, ordering, and synchronization can be manipulated by a set of control variables. In addition, we show that this leads to a system matrix that is linear in the control variables. We define the model predictive scheduling design problem to optimize the schedule, and we show that the problem can be recast as a mixed integer linear programming (MILP) problem.

Navigation Functions for Dynamical, Nonholonomically Constrained Mechanical Systems

Advances in Robot Control

Reinforcement Learning for Port-Hamiltonian Systems

IEEE Transactions on Cybernetics, 2015

Motion estimation based on predator/prey vision

2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

We present an unscented Kalman filter based state estimator for a fast moving rigid body (such as... more We present an unscented Kalman filter based state estimator for a fast moving rigid body (such as a mobile robot) endowed with two video cameras. We focus on forward velocity estimation towards the computation of standard energy cost functions for legged locomotion. Points are chosen as image features and the model of each camera is based on the traditional pinhole projection. The resulting filter's state is composed of the rigid body pose and velocities, together with a measure of depth for each tracked point. By taking inspiration from nature's large predatory and grazing mammals eye configuration, we suggest, via simulation results, a solution for the question of finding the best orientation of two cameras, between side and frontal facing, for velocity estimation in a forward moving robot.

Imitation learning with non-parametric regression

Proceedings of 2012 IEEE International Conference on Automation, Quality and Testing, Robotics, 2012

Humans are very fast learners. Yet, we rarely learn a task completely from scratch. Instead, we u... more Humans are very fast learners. Yet, we rarely learn a task completely from scratch. Instead, we usually start with a rough approximation of the desired behavior and take the learning from there. In this paper, we use imitation to quickly generate a rough solution to a robotic task from demonstrations, supplied as a collection of state-space trajectories. Appropriate control actions needed to steer the system along the trajectories are then automatically learned in the form of a (nonlinear) statefeedback control law. The learning scheme has two components: a dynamic reference model and an adaptive inverse process model, both based on a data-driven, non-parametric method called local linear regression. The reference model infers the desired behavior from the demonstration trajectories, while the inverse process model provides the control actions to achieve this behavior and is improved online using learning. Experimental results with a pendulum swing-up problem and a robotic arm demonstrate the practical usefulness of this approach. The resulting learned dynamics are not limited to single trajectories, but capture instead the overall dynamics of the motion, making the proposed approach a promising step towards versatile learning machines such as future household robots, or robots for autonomous missions.

Visual Servoing for Nonholonomically Constrained Three Degree of Freedom Kinematic Systems

The International Journal of Robotics Research, 2007

This paper addresses problems of robot navigation with nonholonomic motion constraints and percep... more This paper addresses problems of robot navigation with nonholonomic motion constraints and perceptual cues arising from onboard visual servoing in partially engineered environments. A general hybrid procedure is proposed that adapts to the constrained motion setting the standard feedback controller arising from a navigation function in the fully actuated case. This is accomplished by switching back and forth between moving “down” and “across” the associated gradient field toward the stable manifold it induces in the constrained dynamics. Guaranteed to avoid obstacles in all cases, conditions are provided under which the new procedure brings initial configurations to within an arbitrarily small neighborhood of the goal. Simulation results are given for a sample of visual servoing problems with a few different perceptual models. The empirical effectiveness of the proposed algorithm is documented by reporting results of its application to outdoor autonomous visual registration experime...

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012

Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the rein... more Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actorcritic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several standard and natural actor-critic algorithms follows and the paper concludes with an overview of application areas and a discussion on open issues.

Modeling and Control of Legged Locomotion via Switching Max-Plus Models

IEEE Transactions on Robotics, 2014

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS 1 A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Abstract—Policy gradient based actor-critic algorithms are amongst the most popular algorithms in... more Abstract—Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several ...

Approximate analytical solutions to the double-stance dynamics of the lossy spring-loaded inverted pendulum

Bioinspiration & biomimetics, Dec 5, 2016

This paper introduces approximate time-domain solutions to the otherwise non-integrable double-st... more This paper introduces approximate time-domain solutions to the otherwise non-integrable double-stance dynamics of the 'bipedal' spring-loaded inverted pendulum (B-SLIP) in the presence of non-negligible damping. We first introduce an auxiliary system whose behavior under certain conditions is approximately equivalent to the B-SLIP in double-stance. Then, we derive approximate solutions to the dynamics of the new system following two different methods: (i) updated-momentum approach that can deal with both the lossy and lossless B-SLIP models, and (ii) perturbation-based approach following which we only derive a solution to the lossless case. The prediction performance of each method is characterized via a comprehensive numerical analysis. The derived representations are computationally very efficient compared to numerical integrations, and, hence, are suitable for online planning, increasing the autonomy of walking robots. Two application examples of walking gait control are ...

A fast sampling method for estimating the domain of attraction

Nonlinear Dynamics, 2016

Most stabilizing controllers designed for nonlinear systems are valid only within a specific regi... more Most stabilizing controllers designed for nonlinear systems are valid only within a specific region of the state space, called the domain of attraction (DoA). Computation of the DoA is usually costly and time-consuming. This paper proposes a computationally effective sampling approach to estimate the DoAs of nonlinear systems in real time. This method is validated to approximate the DoAs of stable equilibria in several nonlinear systems. In addition, it is implemented for the passivity-based learning controller designed for a second-order dynamical system. Simulation and experimental results show that, in all cases studied, the proposed sampling technique quickly estimates the DoAs, corroborating its suitability for realtime applications.

Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning

Automatica, 2016

This paper considers optimal output synchronization of heterogeneous linear multi-agent systems. ... more This paper considers optimal output synchronization of heterogeneous linear multi-agent systems. Standard approaches to output synchronization of heterogeneous systems require either the solution of the output regulator equations or the incorporation of a p-copy of the leader's dynamics in the controller of each agent. By contrast, in this paper neither one is needed. Moreover, here both the leader's and the follower's dynamics are assumed to be unknown. First, a distributed adaptive observer is designed to estimate the leader's state for each agent. The output synchronization problem is then formulated as an optimal control problem and a novel model-free off-policy reinforcement learning algorithm is developed to solve the optimal output synchronization problem online in real time. It is shown that this optimal distributed approach implicitly solves the output regulation equations without actually doing so. Simulation results are provided to verify the effectiveness of the proposed approach

Three-Dimensional Reconstruction of the Human Motion Based on Images from a Single Camera

Isbs Conference Proceedings Archive, 1998

On the synchronization of cyclic discrete-event systems

2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012

Balancing a Legged Robot Using State-Dependent Riccati Equation Control

IFAC Proceedings Volumes, 2014

In this paper, we propose a nonlinear control approach for balancing underactuated legged robots.... more In this paper, we propose a nonlinear control approach for balancing underactuated legged robots. For the balancing task, the robot is modeled as a generalized version of a Segway. The control design is based on the State-Dependent Riccati Equation (SDRE) approach. The domain of attraction of the SDRE controller is compared to the domain of attraction of a linear quadratic controller. Using a simulation example of a four-legged robot balancing on its hind legs, we show that the SDRE controller gives a reasonably large domain of attraction, even with realistic level constraints on the control input, while the linear quadratic controller is unable to stabilize the system.

Learning Sequential Composition Control

IEEE transactions on cybernetics, Jan 14, 2015

Sequential composition is an effective supervisory control method for addressing control problems... more Sequential composition is an effective supervisory control method for addressing control problems in nonlinear dynamical systems. It executes a set of controllers sequentially to achieve a control specification that cannot be realized by a single controller. As these controllers are designed offline, sequential composition cannot address unmodeled situations that might occur during runtime. This paper proposes a learning approach to augment the standard sequential composition framework by using online learning to handle unforeseen situations. New controllers are acquired via learning and added to the existing supervisory control structure. In the proposed setting, learning experiments are restricted to take place within the domain of attraction (DOA) of the existing controllers. This guarantees that the learning process is safe (i.e., the closed loop system is always stable). In addition, the DOA of the new learned controller is approximated after each learning trial. This keeps the...

On the eigenstructure of a class of max-plus linear systems

IEEE Conference on Decision and Control and European Control Conference, 2011

Switching Max-Plus models for legged locomotion

2009 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2009

We present a new class of gait generation and control algorithms based on the Switching Max-Plus ... more We present a new class of gait generation and control algorithms based on the Switching Max-Plus modeling framework that allows for the synchronization of multiple legs of walking robots. Transitions between stance and swing phases of each leg are modeled as discrete events on a system described by max-plus-linear state equations. Different gaits and gait parameters can be interleaved by using different system matrices. Switching in max-plus-linear systems offers a powerful collection of modeling, analysis, and control tools that, in particular, allow for safe transitions between different locomotion gaits that may involve breaking/enforcing synchronization or changing the order of leg lift off events. Experimental validation of the proposed algorithms is presented by the implementation of various horse gaits on a simple quadruped robot.

Visual registration and navigation using planar features

2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)

Synchronization of a class of cyclic discrete-event systems describing legged locomotion

Discrete Event Dynamic Systems, 2015

It has been shown that max-plus linear systems are well suited for applications in synchronizatio... more It has been shown that max-plus linear systems are well suited for applications in synchronization and scheduling, such as the generation of train timetables, manufacturing, or traffic. In this paper we show that the same is true for multi-legged locomotion. In this framework, the max-plus eigenvalue of the system matrix represents the total cycle time, whereas the max-plus eigenvector dictates the steady-state behavior. Uniqueness of the eigenstructure also indicates uniqueness of the resulting behavior. For the particular case of legged locomotion, the movement of each leg is abstracted to two-state circuits: swing and stance (leg in flight and on the ground, respectively). The generation of a gait (a manner of walking) for a multiple legged robot is then achieved by synchronizing the multiple discrete-event cycles via the max-plus framework. By construction, different gaits and gait parameters can be safely interleaved by using different system matrices. In this paper we address both the transient and steady-state behavior for a class of gaits by presenting closed-form expressions for the max-plus eigenvalue and max-plus eigenvector of the system matrix and the coupling time. The significance of this result is in showing guaranteed robustness to perturbations and gait switching, and also a systematic methodology for synthesizing controllers that allow for legged robots to change rhythms fast.

A modeling framework for model predictive scheduling using switching max-plus linear models

52nd IEEE Conference on Decision and Control, 2013

In this paper we discuss a modeling framework for model predictive scheduling of a class of semi-... more In this paper we discuss a modeling framework for model predictive scheduling of a class of semi-cyclic discrete event systems that can be described by switching max-plus linear models. We study the structure of the system matrices and derive how routing, ordering, and synchronization can be manipulated by a set of control variables. In addition, we show that this leads to a system matrix that is linear in the control variables. We define the model predictive scheduling design problem to optimize the schedule, and we show that the problem can be recast as a mixed integer linear programming (MILP) problem.

Navigation Functions for Dynamical, Nonholonomically Constrained Mechanical Systems

Advances in Robot Control

Reinforcement Learning for Port-Hamiltonian Systems

IEEE Transactions on Cybernetics, 2015

Motion estimation based on predator/prey vision

2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

We present an unscented Kalman filter based state estimator for a fast moving rigid body (such as... more We present an unscented Kalman filter based state estimator for a fast moving rigid body (such as a mobile robot) endowed with two video cameras. We focus on forward velocity estimation towards the computation of standard energy cost functions for legged locomotion. Points are chosen as image features and the model of each camera is based on the traditional pinhole projection. The resulting filter's state is composed of the rigid body pose and velocities, together with a measure of depth for each tracked point. By taking inspiration from nature's large predatory and grazing mammals eye configuration, we suggest, via simulation results, a solution for the question of finding the best orientation of two cameras, between side and frontal facing, for velocity estimation in a forward moving robot.

Imitation learning with non-parametric regression

Proceedings of 2012 IEEE International Conference on Automation, Quality and Testing, Robotics, 2012

Humans are very fast learners. Yet, we rarely learn a task completely from scratch. Instead, we u... more Humans are very fast learners. Yet, we rarely learn a task completely from scratch. Instead, we usually start with a rough approximation of the desired behavior and take the learning from there. In this paper, we use imitation to quickly generate a rough solution to a robotic task from demonstrations, supplied as a collection of state-space trajectories. Appropriate control actions needed to steer the system along the trajectories are then automatically learned in the form of a (nonlinear) statefeedback control law. The learning scheme has two components: a dynamic reference model and an adaptive inverse process model, both based on a data-driven, non-parametric method called local linear regression. The reference model infers the desired behavior from the demonstration trajectories, while the inverse process model provides the control actions to achieve this behavior and is improved online using learning. Experimental results with a pendulum swing-up problem and a robotic arm demonstrate the practical usefulness of this approach. The resulting learned dynamics are not limited to single trajectories, but capture instead the overall dynamics of the motion, making the proposed approach a promising step towards versatile learning machines such as future household robots, or robots for autonomous missions.

Visual Servoing for Nonholonomically Constrained Three Degree of Freedom Kinematic Systems

The International Journal of Robotics Research, 2007

This paper addresses problems of robot navigation with nonholonomic motion constraints and percep... more This paper addresses problems of robot navigation with nonholonomic motion constraints and perceptual cues arising from onboard visual servoing in partially engineered environments. A general hybrid procedure is proposed that adapts to the constrained motion setting the standard feedback controller arising from a navigation function in the fully actuated case. This is accomplished by switching back and forth between moving “down” and “across” the associated gradient field toward the stable manifold it induces in the constrained dynamics. Guaranteed to avoid obstacles in all cases, conditions are provided under which the new procedure brings initial configurations to within an arbitrarily small neighborhood of the goal. Simulation results are given for a sample of visual servoing problems with a few different perceptual models. The empirical effectiveness of the proposed algorithm is documented by reporting results of its application to outdoor autonomous visual registration experime...

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012

Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the rein... more Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actorcritic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several standard and natural actor-critic algorithms follows and the paper concludes with an overview of application areas and a discussion on open issues.

Modeling and Control of Legged Locomotion via Switching Max-Plus Models

IEEE Transactions on Robotics, 2014