An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70...
A simulation framework for RLHF and alternatives. Develop your RLHF meth...
Okapi: Instruction-tuned Large Language Models in Multiple Languages wit...
Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning f...