This directory contains interactive examples that can serve as a step-by-step tutorial showcasing control capabilities in Neuromancer.
-
Part 1: Learning to stabilize a linear dynamical system.
-
Part 2: Learning to stabilize nonlinear ordinary differential equation.
-
Part 3: Learning constrained neural control policy for reference tracking of ordinary differential equation.
-
Part 4: Learning system Neural ODE and control policy for a dynamical systems.
-
Part 5: Learning neural Lyapunov function from trajectories of a nonlinear dynamical system.
Differentiable predictive control (DPC) method represents a flagship capability of the Neuromancer library.
DPC allows us to learn control policy parameters directly by backpropagating model predictive control (MPC) objective function and constraints through the differentiable model of a dynamical system. Instances of a differentiable model include ordinary differential equations (ODEs), including neural ODEs, universal differential equations (UDEs), or neural state space models (SSMs).
The conceptual methodology shown in the figures below consists of two main steps. In the first step, we perform system identification by learning the unknown parameters of differentiable digital twins. In the second step, we close the loop by combining the digital twin models with control policy, parametrized by neural networks, obtaining a differentiable closed-loop dynamics model. This closed-loop model now allow us to use automatic differentiation (AD) to solve the parametric optimal control problem by computing the sensitivities of objective functions and constraints to changing problem parameters such as initial conditions, boundary conditions, and parametric control tasks such as time-varying reference tracking.

*Conceptual methodology. Simulation of the differentiable closed-loop system dynamics
in the forward pass is followed by backward pass computing direct policy gradients for policy optimization *
Our recent development work in Neuromancer has given us the capability to learn parametric control policy (parametrized by trainable weights W)
for a given dynamical systems of the continuous time form:
where x(t) is the time-varying state of the considered system, u(t) are system control inputs, and f is the state transition dynamics.
Or in the discrete time form (e.g., obtained via ODE solver, or via state space model form):
Formally we can formulate the DPC problem as a following parametric
optimal control problem:
The main advantage of having a differentiable closed-loop dynamics model, control
objective function, and constraints in the DPC problem formulation
is that it allows us to use automatic
differentiation (backpropagation through time) to directly compute the policy gradient. In particular,
by representing the problem (15) as a computational graph and leveraging the chain rule, we can directly
compute the gradients of the loss function w.r.t. the policy parameters W as follows:
The forward pass of the DPC computational graph is conceptually
equivalent with a single shooting formulation of the model predictive control (MPC) problem.
The resulting structural equivalence of the
constraints of classical implicit MPC in a dense form with DPC is illustrated in the following figure.
Similarly to MPC, in the
open-loop rollouts, the explicit DPC policy generates future control action trajectories over N-step prediction horizon
given the feedback from the system dynamics model. Then for the closed-loop deployment, we adopt the receding
horizon control (RHC) strategy by applying only the first time step of the computed control action

Structural equivalence of DPC architecture with MPC constraints.
The DPC policy optimization algorithm is summarized in the following figure.
The differentiable system dynamics model is required to instantiate the computational graph of the
DPC problem The policy gradients ∇L are obtained by differentiating the DPC loss function L over
the distribution of initial state conditions and problem parameters sampled from the given training datasets
X and Ξ, respectively. The computed policy gradients now allow us to perform direct policy optimization via
a gradient-based optimizer O. Thus the presented procedure introduces a generic approach for data-driven
solution of model-based parametric optimal control problem (15) with constrained neural control policies
From a reinforcement learning (RL) perspective, the DPC loss L can be seen as a reward function, with ∇L representing a deterministic policy gradient. The main difference compared with actor-critic RL algorithms is that in DPC the reward function is fully parametrized by a closed-loop system dynamics model, control objective, and constraints penalties. The model-based approach avoids approximation errors in reward functions making DPC more sample efficient than model-free RL algorithms
The following figures illustrate DPC policy optimization algorithm on a Part 1 example.

Closed-loop trajectories of learned stabilizing neural control policy using DPC policy optimization.

Evolution of the closed-loop trajectories and DPC neural policy during training.

Landscapes of the learned neural policy via DPC policy optimization algorithm (right)
and explicit MPC policy computed using parametric programming solver (left).
@misc{drgona2022_DPC,
title={Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees},
author={Jan Drgona and Aaron Tuor and Draguna Vrabie},
year={2022},
eprint={2004.11184},
archivePrefix={arXiv},
primaryClass={eess.SY}
}@article{DRGONA202280,
title = {{Differentiable predictive control {:} Deep learning alternative to explicit model predictive control for unknown nonlinear systems}},
journal = {Journal of Process Control},
volume = {116},
pages = {80-92},
year = {2022},
issn = {0959-1524},
author = {Ján Drgoňa and Karol Kiš and Aaron Tuor and Draguna Vrabie and Martin Klaučo}
}@misc{drgona2022_SDPC,
title={Learning Stochastic Parametric Differentiable Predictive Control Policies},
author={Jan Drgona and Sayak Mukherjee and Aaron Tuor and Mahantesh Halappanavar and Draguna Vrabie},
year={2022},
eprint={2203.01447},
archivePrefix={arXiv},
primaryClass={eess.SY}
}