This week I learned about Exploration and Intrinsic Motivation.
This post extends my learning about Actor-Critic algorithms to the off-policy setting.
This post uses Deep Q Networks to introduce off-policy algorithms
This post introduces Actor-Critic Algorithms as an extension of basic policy gradient algorithms such as REINFORCE.
This week begins my deep dive into Policy Gradient methods.