pykoi: Active learning in one unified interface
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等...
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, a...
Chain-of-Hindsight, A Scalable RLHF Method
Code accompanying the paper Pretraining Language Models with Human Prefe...
The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipe...
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer ha...
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement L...
A recipe to train reward models for RLHF.
Okapi: Instruction-tuned Large Language Models in Multiple Languages wit...
Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
Python client library for improving your LLM app accuracy
Reproduce alpaca
Collections of all kinds of LLMs finetuning scripts