The code repo contains multiple code reproduction processes of various SOTA deep learning algorithms
pip install transformers
ViT-B_16
预训练模型 + VIT ModelDT将RL看成一个序列建模问题(Sequence Modeling Problem ),不用传统RL方法,而使用网络直接输出动作进行决策。
Batch-Constrained deep Q- Learning(BCQ)
关键点:
Distributed Distributional Determinisitic Policy Gradient (D4PG)
D4PG将经验收集的Actor和策略学习的Learner分开: