AgileRL Versions Save

Streamlining reinforcement learning with RLOps. State-of-the-art RL algorithms and tools.

v0.1.21

2 months ago

AgileRL v0.1.21 introduces contextual multi-armed bandit algorithms to the framework. Train agents to solve complex optimisation problems with our two new evolvable bandit algorithms!

This release includes the following updates:

Two new evolvable contextual bandit algorithms: Neural Contextual Bandits with UCB-based Exploration and Neural Thompson Sampling
A new contextual bandits training function, enabling the fastest and easiest training
A new BanditEnv class for converting any labelled dataset into a bandit learning environment
Tutorials on using AgileRL bandit algorithms with evolvable hyperparameter optimisation for SOTA results
New demo and benchmarking scripts for bandit algorithms
- more!

More updates will be coming soon!

v0.1.20

3 months ago

AgileRL v0.1.20 focuses on making debugging of reinforcement learning implementations easier. Easily figure out what's going on with our new probe environments, that quickly isolate and validate an agent's ability to solve any kind of problem.

This release includes:

43 single- and multi-agent probe environments for image and vector observation spaces, and discrete and continuous action spaces
New functions that can automate testing with probe environments to quickly isolate your problem
A new Debugging Reinforcement Learning section of the docs, with examples and explanations
General improvements, including more stable learning for DDPG, TD3, MADDPG and MATD3 with image observations

More updates and algorithms coming soon!

v0.1.19

5 months ago

AgileRL v0.1.19 introduces hierarchical curriculum learning to the platform by learning Skills. Teach agents to solve complex problems by breaking down tasks into smaller, learnable sub-tasks. We have collaborated further with the Farama Foundation to introduce more tutorials as well as improving our documentation.

This release includes the following:

New Skills wrapper is introduced to enable hierarchical curriculum learning with any algorithm. A tutorial is also provided to demonstrate how to use it.
Single-agent Gymnasium tutorials are introduced, demonstrating how to use PPO, TD3 and Rainbow DQN on a variety of environments.
Documentation site is improved, check it out: https://docs.agilerl.com
General algorithm improvements throughout the framework

Stay tuned for more updates coming soon!

v0.1.14

6 months ago

AgileRL v0.1.14 introduces usability improvements to the framework with better warnings and error messages. This update also includes more robust unit tests across the library and general improvements. Multi-agent algorithms also receive updates to better handle discrete action spaces. 🤖

v0.1.13

6 months ago

AgileRL v0.1.13 introduces more flexibility, allowing users to define their own custom networks and use them with our algorithms and SOTA hyperparameter optimisation. Additionally, we have continued collaborating with the Farama Foundation to bring you another tutorial.

This release includes the following:

MakeEvolvable wrapper to make any sequential network evolvable - wrap any CNN or MLP to make them compatible with AgileRL algorithms and evolutionary hyperparameter optimisation! 🧫
Use pre-trained networks with AgileRL - load any PyTorch nn.module network into AgileRL to automatically make it evolvable. 🎓
Self-play tutorial that harnesses curriculum learning to train a DQN agent to play connect4! 🏆

Stay tuned for more updates coming soon!

v0.1.12

7 months ago

AgileRL v0.1.12 introduces two new, powerful algorithms to the framework among other features. We have collaborated with the Farama Foundation to introduce tutorials for multi-agent reinforcement learning, with more tutorials on the way.

This release includes the following updates:

Proximal Policy Optimization (PPO) is added to the framework - train on-policy efficiently. 🧠
Rainbow DQN is added to the framework - combines multiple improvements over DQN. 🏆
Prioritized experience replay buffer and multi-step replay buffers are introduced to the framework. 📚
Tutorials for multi-agent algorithms included, with more coming soon. 🧑‍🎓

Stay tuned for more updates very soon!

v0.1.8

8 months ago

AgileRL v0.1.8 introduces multi-agent algorithms into the framework. Train multiple agents in co-operative or competitive Petting Zoo-style (parallel API) environments, with significantly faster training and up to 4x improvement in total return when benchmarked against epymarl’s equivalent offering!

This release includes the following updates:

MADDPG is added to the framework! Train multiple agents in competitive or co-operative environments. 👾👾
MATD3 is added to the framework! Train multiple agents with greater stability. ✅
Addition of multi-agent replay buffer class and multi-agent train function. 🏋️
Training config files. Configure training runs in one place. 📍

Keep an eye out for further updates coming soon!

v0.1.7

10 months ago

AgileRL v0.1.7 introduces distributed training to the framework with HuggingFace Accelerate! Train even faster by taking full advantage of your entire compute stack.

This release includes the following updates:

Distributed training. Train across multiple GPUs to cut down your training time even further! 🤖
New Sampler class to handle both standard and distributed replay buffers. 👓
TD3 is added to the framework! Train agents with continuous actions with greater stability. 👾
More and expanded demos and benchmarking files for online, offline and distributed training. 🧙‍♂️

Stay tuned for more features coming soon!

v0.1.6

11 months ago

AgileRL v0.1.6 introduces offline reinforcement learning to the framework. You can now easily train agents on static data, and use evolutionary hyperparameter optimisation to learn faster and better.

This release includes the following updates:

New general offline RL training function to learn from static data 🗂️
Conservative Q-Learning (CQL) added 🚀

More new features coming soon!

v0.1.5

1 year ago

AgileRL v0.1.5 introduces evolvable transformers that can be used for language tasks, including for Reinforcement Learning from Human Feedback (RLHF). Combining LLMs and transformer architectures with evolvable HPO can massively reduce the time taken to finetune these expensive models.

This release includes the following updates:

Evolvable GPT and BERT models, compatible with evolutionary HPO 🔮
Implicit Language Q Learning (ILQL) added - an RLHF offline algorithm 📚
Better mutation support 🦾

New features are continuously being added, stay tuned!