A PyTorch library for building deep reinforcement learning agents.
This release contains some minor changes to several key APIs.
We added a new method to the Agent
interface called eval
. eval
is the same as act
, except the agent does not perform any training updates. This is useful for measure the performance of an agent at the end of a training run. Speaking of which...
We completely refactored the all.experiments
module. First of all, the primary public entry point is now a function called run_experiment
. Under the hood, there is a new Experiment
interface:
class Experiment(ABC):
'''An Experiment manages the basic train/test loop and logs results.'''
@abstractmethod
def frame(self):
'''The index of the current training frame.'''
@property
@abstractmethod
def episode(self):
'''The index of the current training episode'''
@abstractmethod
def train(self, frames=np.inf, episodes=np.inf):
'''
Train the agent for a certain number of frames or episodes.
If both frames and episodes are specified, then the training loop will exit
when either condition is satisfied.
Args:
frames (int): The maximum number of training frames.
episodes (bool): The maximum number of training episodes.
'''
@abstractmethod
def test(self, episodes=100):
'''
Test the agent in eval mode for a certain number of episodes.
Args:
episodes (int): The number of test epsiodes.
Returns:
list(float): A list of all returns received during testing.
'''
Notice the new method, experiment.test()
. This method runs the agent in eval
mode for a certain number of episodes and logs summary statistics (the mean and std of the returns).
Finally, we clarified the usage of Approximation.eval(*inputs)
by adding an additional method, Approximation.no_grad(*inputs)
. eval()
both puts the network in evaluation mode and runs the forward pass with torch.no_grad()
. no_grad()
simply runs a forward pass in the current mode. The various Policy
implementations were also adjusted to correctly execute the greedy behavior in eval
mode.
The first public release of the library!
Small but important update!
all.experiments.plot
module, with plot_returns_100
function that accepts a runs
directory and plots contained results.a2c
Atari preset to match the configuration of the other algorithms betterThis release contains several usability enhancements! The biggest change, however, is a refactor. The policy classes now extend from Approximation
. This means that things like target networks, learning rate schedulers, and model saving is all handled in one place!
This full list of changes is:
Approximation
(#89)eval()
and target()
: the former runs a forward pass of the current network, the latter does so on the target network, each without creating a computation graph. (#94)AdvantageBuffer
API. Also fix a minor bug in A2C (#95)A bunch in SoftDeterministicPolicy was slowing learning and causing numerical instability in some cases. This fixes that.
Added Soft-Actor Critic (SAC). SAC is a state-of-the-art algorithm for continuous control based on the max-entropy RL framework.
PPO
and Vanilla
release!
Vanilla
series agents: "vanilla" implementations of actor-critic, sarsa, q-learning, and REINFORCE. These algorithms are all prefixed with the letter "v" in the agents
folder.This release introduces continuous
policies and agents, including DDPG
. Also includes a number of quality-of-life improvements:
continuous
agent suiteGaussian
policyDeterministicPolicy
Approximation
base class from which QNetwork
, VNetwork
, etc. are derivedlayers
module to all.nn
. Extend from torch.nn
with custom layers added, to make crafting unique networks easier.DDPG
agentThe release contains a bunch of changes under the hood. The agent
API was simplified down to a single method, action = agent.act(state, reward)
. The accompany this change, State
was added as a first class object. Terminal states now have the state.mask
set to 0, whereas before terminal states were represented by None
.
Another major addition is slurm
support. This is in particular to aid in running on gypsum
. The SlurmExperiment
API handles the creation of the appropriate .sh
files, output, etc., so experiments can be run on slurm
by writing a single python script! No more writing .sh
files by hand! Examples can be found in the demos
folder.
There were a few other minor changes as well.
Change log:
act
#56write_loss
togglable #64