DI Engine Versions Save

OpenDILab Decision AI Engine

v0.5.1

3 months ago

Env

add MADDPG pettingzoo example (#774)
polish NGU Atari configs (#767)
fix bug in cliffwalking env (#759)
add PettingZoo replay video demo
change default max retry in env manager from 5 to 1

Algorithm

add QGPO diffusion-model related algorithm (#757)
add HAPPO multi-agent algorithm (#717)
add DreamerV3 + MiniGrid adaption (#725)
fix hppo entropy_weight to avoid nan error in log_prob (#761)
fix structured action bug (#760)
polish Decision Transformer entry (#754)
fix EDAC policy/model bug

Fix

fix env typos
fix pynng requirements bug
fix communication module unittest bug

Style

polish policy API doc (#762) (#764) (#768)
add agent API doc (#758)
polish torch_utils/utils API doc (#745) (#747) (#752) (#755) (#763)

News

Full Changelog: https://github.com/opendilab/DI-engine/compare/v0.5.0...v0.5.1

Contributors: @PaParaZz1 @zjowowen @nighood @kxzxvbk @puyuan1996 @Cloud-Pku @AltmanD @HarryXuancy

v0.5.0

5 months ago

Env

add tabmwp env (#667)
polish anytrading env issues (#731)

Algorithm

add PromptPG algorithm (#667)
add Plan Diffuser algorithm (#700) (#749)
add new pipeline implementation of IMPALA algorithm (#713)
add dropout layers to DQN-style algorithms (#712)

Enhancement

add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (#637) (#730) (#737)
add more unittest cases for model (#728)
add collector logging in new pipeline (#735)

Fix

fix logger middleware problems (#715)
fix ppo parallel bug (#709)
fix typo in optimizer_helper.py (#726)
fix mlp dropout if condition bug
fix drex collecting data unittest bugs

Style

polish env manager/wrapper comments and API doc (#742)
polish model comments and API doc (#722) (#729) (#734) (#736) (#741)
polish policy comments and API doc (#732)
polish rl_utils comments and API doc (#724)
polish torch_utils comments and API doc (#738)
update README.md and Colab demo (#733)
update metaworld docker image

News

NeurIPS 2023 Spotlight: LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
OpenDILab + Hugging Face DRL Model Zoo link

Full Changelog: https://github.com/opendilab/DI-engine/compare/v0.4.9...v0.5.0

Contributors: @PaParaZz1 @zjowowen @AltmanD @puyuan1996 @kxzxvbk @Super1ce @nighood @Cloud-Pku @zhangpaipai @ruoyuGao @eltociear

v0.4.9

8 months ago

API Change

refactor the implementation of Decision Transformer, DI-engine supports both discrete and continuous DT outputs with the multi-modal observation now (example: ding/example/dt.py)
Update the multi-GPU Distributed Data Parallel (DDP) example (link)
Change the return value of InteractionSerialEvaluator, simplifying redundant results

Env

add cliffwalking env (#677)
add lunarlander ppo config and example

Algorithm

add BCQ offline RL algorithm (#640)
add Dreamerv3 model-based RL algorithm (#652)
add tensor stream merge network tools (#673)
add scatter connection model (#680)
refactor Decision Transformer in new pipeline and support img input and discrete output (#693)
add three variants of Bilinear classes and a FiLM class (#703)

Enhancement

polish offpolicy RL multi-gpu DDP training (#679)
add middleware for Ape-X distributed pipeline (#696)
add example for evaluating trained DQN (#706)

Fix

fix to_ndarray fails to assign dtype for scalars (#708)
fix evaluator return episode_info compatibility bug
fix cql example entry wrong config bug
fix enable_save_figure env interface
fix redundant env info bug in evaluator
fix to_item unittest bug

Style

polish and simplify requirements (#672)
add Hugging Face Model Zoo badge (#674)
add openxlab Model Zoo badge (#675)
fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (#678)
fix mujoco-py compatibility issue for cython<3 (#711)
fix type spell error (#704)
fix pypi release actions ubuntu 18.04 bug
update contact information (e.g. wechat)
polish algorithm doc tables

New Repo

DOS: [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning

Full Changelog: https://github.com/opendilab/DI-engine/compare/v0.4.8...v0.4.9

Contributors: @PaParaZz1 @zjowowen @zhangpaipai @AltmanD @puyuan1996 @Cloud-Pku @Super1ce @kxzxvbk @jayyoung0802 @Mossforest @lxl2gf @Privilger

v0.4.8

11 months ago

API Change

stop value is not the necessary field in config, defaults to math.inf, users can indicate max_env_step or max_train_iter in training entry to run the program with a fixed termination condition.

Env

fix gym hybrid reward dtype bug (#664)
fix atari env id noframeskip bug (#655)
fix typo in gym any_trading env (#654)
update td3bc d4rl config (#659)
polish bipedalwalker config

Algorithm

add EDAC offline RL algorithm (#639)
add LN and GN norm_type support in ResBlock (#660)
add normal value norm baseline for PPOF (#658)
polish last layer init/norm in MLP (#650)
polish TD3 monitor variable

Enhancement

add MAPPO/MASAC task example (#661)
add PPO example for complex env observation (#644)
add barrier middleware (#570)

Fix

fix abnormal collector log and add record_random_collect option (#662)
fix to_item compatibility bug (#646)
fix trainer dtype transform compatibility bug
fix pettingzoo 1.23.0 compatibility bug
fix ensemble head unittest bug

Style

fix incompatible gym version bug in Dockerfile.env (#653)
add more algorithm docs

New Repo

LightZero: A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit.

Full Changelog: https://github.com/opendilab/DI-engine/compare/v0.4.6...v0.4.7

Contributors: @PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear

v0.4.7

1 year ago

API Change

remove the requirements of sub fields (learn/collect/eval) in the policy config (users can define their own config formats)
use wandb as the default logger in task pipeline
remove value_network config field and implementations in SAC and related algorithms

Env

add dmc2gym env support and baseline (#451)
update pettingzoo to the latest version (#597)
polish icm/rnd+onppo config bugs and add app_door_to_key env (#564)
add lunarlander continuous TD3/SAC config
polish lunarlander discrete C51 config

Algorithm

add Procedure Cloning (PC) imitation learning algorithm (#514)
add Munchausen Reinforcement Learning (MDQN) algorithm (#590)
add reward/value norm methods: popart & value rescale & symlog (#605)
polish reward model config and training pipeline (#624)
add PPOF reward space demo support (#608)
add PPOF Atari demo support (#589)
polish dqn default config and env examples (#611)
polish comment and clean code about SAC

Enhancement

add language model (e.g. GPT) training utils (#625)
remove policy cfg sub fields requirements (#620)
add full wandb support (#579)

Fix

fix confusing shallow copy operation about next_obs (#641)
fix unsqueeze action_args in PDQN when shape is 1 (#599)
fix evaluator return_info tensor type bug (#592)
fix deque buffer wrapper PER bug (#586)
fix reward model save method compatibility bug
fix logger assertion and unittest bug
fix bfs test py3.9 compatibility bug
fix zergling collector unittest bug

Style

add DI-engine torch-rpc p2p communication docker (#628)
add D4RL docker (#591)
correct typo in task (#617)
correct typo in time_helper (#602)
polish readme and add treetensor example
update contributing doc

New Plan

Call for contributors about DI-engine (#621)

Full Changelog: https://github.com/opendilab/DI-engine/compare/v0.4.6...v0.4.7

Contributors: @PaParaZz1 @karroyan @zjowowen @ruoyuGao @kxzxvbk @nighood @song2181 @SolenoidWGT @PSHarold @jimmydengpeng @eltociear

v0.4.6

1 year ago

API Change

middleware: CkptSaver(cfg, policy, train_freq=100) -> CkptSaver(policy, cfg.exp_name, train_freq=100)

Env

add metadrive env and related ppo config (#574)
add acrobot env and related dqn config (#577)
add carracing in box2d (#575)
add new gym hybrid viz (#563)
update cartpole IL config (#578）

Algorithm

add BDQ algorithm (#558)
add procedure cloning model (#573)

Enhancement

add simplified PPOF (PPO × Family) interface (#567) (#568) (#581) (#582)

Fix

fix to_device and prev_state bug when using ttorch (#571)
fix py38 and numpy unittest bugs (#565)
fix typo in contrastive_loss.py (#572)
fix dizoo envs pkg installation bugs
fix multi_trainer middleware unittest bug

Style

add evogym docker (#580)
fix metaworld docker bug
fix setuptools high version incompatibility bug
extend treetensor lowest version

New Paper

GoBigger: [ICLR 2023] A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation

Contributors: @PaParaZz1 @puyuan1996 @timothijoe @Cloud-Pku @ruoyuGao @Super1ce @karroyan @kxzxvbk @eltociear

v0.4.5

1 year ago

API Change

Move default examples about adding new env from extending BaseEnv to utilize DingEnvWrapper
rename final_eval_reward to eval_episode_return in all related codes (including envs and evaluators)

Env

add beergame supply chain optimization env (#512)
add env gym_pybullet_drones (#526)
rename eval reward to episode return (#536)

Algorithm

add policy gradient algo implementation (#544)
add MADDPG algo implementation (#550)
add IMPALA continuous algo implementation (#551)
add MADQN algo implementation (#540)

Enhancement

add new task IMPALA-type distributed training scheme (#321)
add load and save method for replaybuffer (#542)
add more DingEnvWrapper example (#525)
add evaluator more info viz support (#538)
add trackback log for subprocess env manager (#534)

Fix

fix halfcheetah td3 config file (#537）
fix mujoco action_clip args compatibility bug (#535)
fix atari a2c config entry bug
fix drex unittest compatibility bug

Style

add Roadmap issue of DI-engine (#548)
update related project link and new env doc

New Project

PPOxFamily: PPO x Family DRL Tutorial Course
ACE: [AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".

Contributors: @PaParaZz1 @sailxjx @zjowowen @hiha3456 @Weiyuhong-1998 @kxzxvbk @song2181 @zerlinwang

v0.4.4

1 year ago

API Change

context in new task pipeline is implemented by dataclass now, rather than dict
recommend visulization is wandb now, rather than tensorboard

Env

add modified gym-hybrid including moving, sliding and hardmove (#505) (#519)
add evogym support (#495) (#527)
add save_replay_gif option (#506)
adapt minigrid_env and related config to latest MiniGrid v2.0.0 (#500)

Algorithm

add pcgrad optimizer (#489)
add some features in MLP and ResBlock (#511)
delete mcts related modules (#518) (we will release a MCTS repo in future)

Enhancement

add wandb middleware and demo (#488) (#523) (#528)
add new properties in Context (#499)
add single env policy wrapper for policy deployment (demo)
add custom model demo and doc (文档)

Fix

fix build logger args and unittests (#522)
fix total_loss calculation in PDQN (#504)
fix save gif function bug
fix level sample unittest bug

Style

update contact email address (#503)
polish env log and resblock name
add details button in readme

New Repo

DI-1024: Deep Reinforcement Learning + 1024 Game

Contributors: @PaParaZz1 @puyuan1996 @karroyan @hiha3456 @davide97l @Weiyuhong-1998 @zjowowen @norman26625

v0.4.3

1 year ago

Env

add rule-based gomoku expert (#465)

Algorithm

fix a2c policy batch size bug (#481)
enable activation option in collaq attention and mixer
minor fix about IBC (#477)

Enhancement

add IGM support (#486)
add tb logger middleware and demo

Fix

the type conversion in ding_env_wrapper (#483)
di-orchestrator version bug in unittest (#479)
data collection errors caused by shallow copies (#475)
gym==0.26.0 seed args bug

Style

add readme tutorial link(environment & algorithm) (#490) (#493)
adjust location of the default_model method in policy (#453)

New Repo

DI-sheep: Deep Reinforcement Learning + 3 Tiles Game

Contributors: @PaParaZz1 @nighood @norman26625 @ZHZisZZ @cpwan @mahuangxu

v0.4.2

1 year ago

API Change

config will be deepcopyed by default in compile_config function
After calling compile_config function, current code repo git log and git diff information will be saved in exp_name directory

Env

add rocket env (#449)
updated pettingzoo env and improved related performance (#457)
add mario env demo (#443)
add MAPPO multi-agent config (#464)
add mountain car (discrete action) environment (#452)
fix multi-agent mujoco gym comaptibility bug
fix gfootball env save_replay variable init bug

Algorithm

add IBC (Implicit Behaviour Cloning) algorithm (#401)
add BCO (Behaviour Cloning from Observation) algorithm (#270)
add continuous PPOPG algorithm (#414)
add PER in CollaQ (#472)
add activation option in QMIX and CollaQ

Enhancement

update ctx to dataclass (#467)

Fix

base_env FinalMeta bug about gym 0.25.0-0.25.1
config inplace modification bug
ding cli no argument problem
import errors after running setup.py (jinja2, markupsafe)
conda py3.6 and cross platform build bug

Style

add project state and datetime in log dir (#455)
polish notes for q-learning model (#427)
revision to mujoco dockerfile and validation (#474)
add dockerfile for cityflow env
polish default output log format

Contributors: @PaParaZz1 @ZHZisZZ @zjowowen @song2181 @zerlinwang @i-am-tc @hiha3456 @nighood @kxzxvbk @Weiyuhong-1998 @RobinC94