Natural, Trust Region and Proximal Policy Optimization

In this TransferLab Blog we present an overview of the theory behind three popular and related algorithms for gradient based policy optimization: natural policy gradient descent, trust region policy optimization (TRPO) and proximal policy optimization (PPO). After reviewing some useful and well-established concepts from mathematical optimization theory, the algorithms can be introduced in a very unifying manner.

