Clean baseline implementation of PPO using an episodic TransformerXL memory
Baseline implementation of recurrent PPO using truncated BPTT