Autonomous Learning Library Versions Save

A PyTorch library for building deep reinforcement learning agents.

v0.5.0

4 years ago

This release contains some minor changes to several key APIs.

Agent Evaluation Mode

We added a new method to the Agent interface called eval. eval is the same as act, except the agent does not perform any training updates. This is useful for measure the performance of an agent at the end of a training run. Speaking of which...

Experiment Refactoring: Train/Test

We completely refactored the all.experiments module. First of all, the primary public entry point is now a function called run_experiment. Under the hood, there is a new Experiment interface:

class Experiment(ABC):
    '''An Experiment manages the basic train/test loop and logs results.'''

    @abstractmethod
    def frame(self):
        '''The index of the current training frame.'''

    @property
    @abstractmethod
    def episode(self):
        '''The index of the current training episode'''

    @abstractmethod
    def train(self, frames=np.inf, episodes=np.inf):
        '''
        Train the agent for a certain number of frames or episodes.
        If both frames and episodes are specified, then the training loop will exit
        when either condition is satisfied.

        Args:
                frames (int): The maximum number of training frames.
                episodes (bool): The maximum number of training episodes.
        '''

    @abstractmethod
    def test(self, episodes=100):
        '''
        Test the agent in eval mode for a certain number of episodes.

        Args:
            episodes (int): The number of test epsiodes.

        Returns:
            list(float): A list of all returns received during testing.
        '''

Notice the new method, experiment.test(). This method runs the agent in eval mode for a certain number of episodes and logs summary statistics (the mean and std of the returns).

Approximation: no_grad vs. eval

Finally, we clarified the usage of Approximation.eval(*inputs) by adding an additional method, Approximation.no_grad(*inputs). eval() both puts the network in evaluation mode and runs the forward pass with torch.no_grad(). no_grad() simply runs a forward pass in the current mode. The various Policy implementations were also adjusted to correctly execute the greedy behavior in eval mode.

v0.4.0

4 years ago

The first public release of the library!

v0.3.3

4 years ago

Small but important update!

  1. Added all.experiments.plot module, with plot_returns_100 function that accepts a runs directory and plots contained results.
  2. Tweaked the a2c Atari preset to match the configuration of the other algorithms better

v0.3.1

4 years ago
  1. Add C51, a distributional RL agent
  2. Add double-dqn agent (ddqn)
  3. UIpdate the Atari wrappers to exactly match deepmind

v0.3.0

4 years ago

This release contains several usability enhancements! The biggest change, however, is a refactor. The policy classes now extend from Approximation. This means that things like target networks, learning rate schedulers, and model saving is all handled in one place!

This full list of changes is:

  • Refactored experiment API (#88)
  • Policies inherit from Approximation (#89)
  • Models now save themselves automatically every 200 updates. Also, you can load models and watch them play in each environment! (#90)
  • Automatically set the temperature in SAC (#91)
  • Schedule learning rates and other parameters (#92)
  • SAC bugfix
  • Refactor usage of target networks. Now there is a difference between eval() and target(): the former runs a forward pass of the current network, the latter does so on the target network, each without creating a computation graph. (#94)
  • Tweak AdvantageBuffer API. Also fix a minor bug in A2C (#95)
  • Report the best returns so far in separate metric (#96)

v0.2.4

4 years ago

A bunch in SoftDeterministicPolicy was slowing learning and causing numerical instability in some cases. This fixes that.

v0.2.3

4 years ago

Added Soft-Actor Critic (SAC). SAC is a state-of-the-art algorithm for continuous control based on the max-entropy RL framework.

v0.2.2

4 years ago

PPO and Vanilla release!

  1. Add PPO, one of the most popular modern RL algorithms.
  2. Add Vanilla series agents: "vanilla" implementations of actor-critic, sarsa, q-learning, and REINFORCE. These algorithms are all prefixed with the letter "v" in the agents folder.

v0.2.1

4 years ago

This release introduces continuous policies and agents, including DDPG. Also includes a number of quality-of-life improvements:

  • Add continuous agent suite
  • Add Gaussian policy
  • Add DeterministicPolicy
  • Introduce Approximation base class from which QNetwork, VNetwork, etc. are derived
  • Convert layers module to all.nn. Extend from torch.nn with custom layers added, to make crafting unique networks easier.
  • Introduce DDPG agent

v0.2.0

4 years ago

The release contains a bunch of changes under the hood. The agent API was simplified down to a single method, action = agent.act(state, reward). The accompany this change, State was added as a first class object. Terminal states now have the state.mask set to 0, whereas before terminal states were represented by None.

Another major addition is slurm support. This is in particular to aid in running on gypsum. The SlurmExperiment API handles the creation of the appropriate .sh files, output, etc., so experiments can be run on slurm by writing a single python script! No more writing .sh files by hand! Examples can be found in the demos folder.

There were a few other minor changes as well.

Change log:

  • Simplified agent API to only include act #56
  • Added State object #51
  • Added SlurmExperiment for running on gypsum #53
  • Updated the local and release scripts, and added slurm demos #54
  • Tweaked parameter order in replay buffers #59
  • Improved shared feature handling #63
  • Made write_loss togglable #64
  • Tweaked default hyperparameters