Reproduce alpaca
Collections of all kinds of LLMs finetuning scripts
Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning f...
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer h...
对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with o...