Recipes to train reward model for RLHF.
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer ha...
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise...
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer h...