Reinforcement Learning: Deep Deterministic Policy Gradient #3494

tareknaser · 2023-06-09T14:36:42Z

Description

This pull request implements the DDPG (Deep Deterministic Policy Gradient) algorithm, along with 2 test cases.

Implementation details

DDPG is an actor-critic algorithm designed for continuous action spaces. It combines deep neural networks with deterministic policy gradients to learn optimal policies in a continuous control setting.

Implemented four networks:

policyNetwork (actor network)
targetPNetwork (target actor network)
learningQNetwork (critic network)
targetQNetwork (target critic network)

How Has This Been Tested?

Included a Pendulum test that successfully passes on my machine. The average reward achieved in the Pendulum environment is approximately -500, indicating successful learning.
Additionally, added a test for continuous action spaces, which also passes.

The test configurations for DDPG are adapted from the SAC (Soft Actor-Critic) implementation since both DDPG and SAC are policy gradient off-policy algorithms. This ensures consistent evaluation and comparison of the algorithms.

HISTORY.md

src/mlpack/methods/reinforcement_learning/ddpg.hpp

Signed-off-by: Tarek <[email protected]>

src/mlpack/tests/q_learning_test.cpp

Signed-off-by: Tarek <[email protected]>

zoq

No more comments from my side, awesome work.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Jun 9, 2023

zoq added c: methods t: added feature and removed s: unanswered s: unlabeled labels Jun 9, 2023

zoq reviewed Jun 11, 2023

View reviewed changes

HISTORY.md Outdated Show resolved Hide resolved

tareknaser force-pushed the ddpg branch from 5d9eb07 to 2ab4f6b Compare June 12, 2023 01:06

shubham1206agra reviewed Jun 13, 2023

View reviewed changes

src/mlpack/methods/reinforcement_learning/ddpg.hpp Show resolved Hide resolved

tareknaser added 4 commits June 13, 2023 16:06

feat(rl): implement ddpg and test on pendulum task

884134a

Signed-off-by: Tarek <[email protected]>

feat(rl): add a continuous action space test for ddpg

bd25f57

Signed-off-by: Tarek <[email protected]>

include ddpg in history

07e56a7

Signed-off-by: Tarek <[email protected]>

fix(rl): add tparam for critic

2187ba2

Signed-off-by: Tarek <[email protected]>

tareknaser force-pushed the ddpg branch from 2660f06 to 2187ba2 Compare June 13, 2023 13:10

zoq reviewed Jun 14, 2023

View reviewed changes

src/mlpack/tests/q_learning_test.cpp Outdated Show resolved Hide resolved

fix(rl): use smaller networks for testing

465f021

Signed-off-by: Tarek <[email protected]>

zoq approved these changes Jun 16, 2023

View reviewed changes

mlpack-bot bot approved these changes Jun 17, 2023

View reviewed changes

mlpack-bot bot removed the s: needs review label Jun 17, 2023

Merge branch 'master' into ddpg

355f667

shubham1206agra merged commit 590c1b1 into mlpack:master Jun 20, 2023

tareknaser deleted the ddpg branch July 8, 2023 12:41

rcurtin mentioned this pull request Sep 5, 2023

Release version 4.2.1 #3533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reinforcement Learning: Deep Deterministic Policy Gradient #3494

Reinforcement Learning: Deep Deterministic Policy Gradient #3494

Uh oh!

tareknaser commented Jun 9, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zoq left a comment

Uh oh!

mlpack-bot bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Reinforcement Learning: Deep Deterministic Policy Gradient #3494

Reinforcement Learning: Deep Deterministic Policy Gradient #3494

Uh oh!

Conversation

tareknaser commented Jun 9, 2023

Description

Implementation details

How Has This Been Tested?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zoq left a comment

Choose a reason for hiding this comment

Uh oh!

mlpack-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants