A unified end-to-end learning and control framework that is able to learn a (neural) control objective function, dynamics equation, control policy, or/and optimal trajectory in a control system.
The Pontryagin-Differentiable-Programming (PDP) project establishes a unified end-to-end framework to solve a broad class of learning and control tasks. Please find out more details in
Additional Note (updated on June 2022):
The current version of the PDP project consists of three folders:
PDP: an independent package implementing PDP cores. It contains a core module called PDP.py, where four classes are defined and each includes certain functionalities, as described below.
Note: each class can be used independently, for example, you can use only OCSys to solve your own optimal control problem. Each of the above classes is easy to approach and you can immediately tell the utility of different methods within by looking at its name. All important lines are commented in great details.
JinEnv: an independent package that provides environments/visualizations of some typical physical systems for you to run your algorithms on. The JinEnv includes environments from simple (e.g., single inverted pendulum) to complex one (e.g., 6-DoF rocket powered landing). These environments can be used for you to test your performance of your learning/control methods. The dynamics and control objective functions of these physical systems are off-the-shelf by default, but also allow you to customize them using the user-friendly interfaces. Each environment is defined as an independent class:
For each environment, you can freely customize its dynamics parameters and control cost function. Each environment is independent. Each environment has visualization methods for you to showcase your results.
Examples: including various examples of using PDP to solve different learning or control tasks, including inverse reinforcement learning, optimal control or model-based reinforcement learning, and system identification. The examples are classified based on the problems:
Each learning or control task is tested in different environments: inverted pendulum, robot arm, cart-pole, quadrotor maneuvering, and rocket powered landing.
You can directly run each script.py under Examples folder.
Please make sure that the following packages have already been installed before use of the PDP package or JinEnv Package.
Note: before you try the PDP and JinEnv Packages, we strongly recommend you to familiarize yourself with the CasADi programming language, e.g., how to define a symbolic expression/function. Reading through Sections 2, 3, 4 on the page https://web.casadi.org/docs/ is enough (around 30 mins)! Because this really helps you to debug your codes when you test your own system using the PDP package here. We also recommend you to read through the PDP paper: https://arxiv.org/abs/1912.12970 because all of the notations/steps in the codes are strictly following the paper.
The codes have been tested and run smoothly with Python 3.7. on MacOS (10.15.7) machine.
First of all, you need to be relaxed: we have optimized the interface in the PDP package and JinEnv Package, which hopefully minimizes your effort on understanding and using them. All methods and variables within are pretty straightforward and carefully commented! In most of cases, all you need to do is to specify the symbolic expressions of your control system: its dynamics, policy, or control cost function, then PDP will take care of the rest.
The quickest way to get a big picture of the codes is to examine and run each example:
To solve IRL/IOC problems, you will mainly need the following two classes from ./PDP/PDP.py module:
OCSys: which is to solve the optimal control system in forward pass and then construct the auxiliary control system in backward pass. The procedure to instantiate an OCSys object is fairly straightforward, including nine steps:
Note: if you are only using OCSys to solve your optimal control problem (not for IOC/IRL), you can ignore Steps 3, 8, and 9, and also ignore the use of LQR (the next class) to solve your auxiliary control system.
LQR : which is to solve the auxiliary control system in backward pass and obtain the analytical derivative of the forward-pass trajectory with respect to the parameters within the dynamics and control cost function. The procedure to instantiate an LQR object is fairly straightforward, including four steps:
Examples for IRL/IOC tasks: check and run all examples under ./Examples/IRL/ folder.
To solve optimal control or planning problems, you only need ControlPlanning class from ./PDP/PDP.py module:
ControlPlanning. The procedure to instantiate a ControlPlanning object is fairly straightforward, including the following nine steps:
The user can also choose one of the following added features to improve the performance of PDP:
Examples for control or planning tasks: check and run all examples under ./Examples/OC/ folder.
To solve system identification problems, you will need SysID class from the module ./PDP/PDP.py:
Examples for system identification tasks: check and run all examples under ./Examples/SysID/ folder.
Each environment is defined as a class, which contains the following methods:
initDyn: which is used to initialize the dynamics of a pysical system. The input arguments are parameters (values) of the dynamics. You can pass a specific value to each parameter, otherwise, the parameter is None (by default) and will become a learnable variable in your dynamics. Some variables within the initDyn method are
initCost: which is used to initialize the control cost function of a pysical system. The cost function by default is a weighed distance to the goal state plus a control effort term, and the input arguments to initCost are the weights. You can pass a specific value to each weight, otherwise the weight is None (by default) and will be a learnable variable in your cost function. Some attributes for the initCost method are
play_animation: which is used to visualize the motion of the pysical system. The input is the state (control) trajectory.
Examples for using each of the environments: check and run all examples under ./Examples/ folder.
If you have encountered a bug in your implementation of the code, please feel free to let me know.
If you also want the codes of other methods, e.g., inverse KKT, iterative LQR, or GPS, policy imitations, which are compared in our paper (https://arxiv.org/abs/1912.12970).
Please also let me know.
Currently, I am working on developing a general control tool box in Python, which includes all these popular methods (may publish also in near future).
If you find this project helpful in your publications, please consider citing our paper (accepted by NeurIPS, 2020).
@article{jin2020pontryagin,
title={Pontryagin differentiable programming: An end-to-end learning and control framework},
author={Jin, Wanxin and Wang, Zhaoran and Yang, Zhuoran and Mou, Shaoshuai},
journal={Advances in Neural Information Processing Systems},
volume={33},
pages={7979--7992},
year={2020}
}