Natural, Trust Region and Proximal Policy Optimization
In this TransferLab Blog we present an overview of the theory behind three popular and related algorithms for gradient based policy optimization: natural policy gradient descent, trust region policy optimization (TRPO) and proximal policy optimization (PPO). After reviewing some useful and well-established concepts from mathematical optimization theory, the algorithms can be introduced in a very unifying manner.
Visit the TransferLab