LLaMA Efficient Tuning Versions Save

Unify Efficient Fine-Tuning of 100+ LLMs

v0.7.0

2 weeks ago

Congratulations on 20k stars 🎉 We are the 1st of the GitHub Trending at Apr. 23rd 🔥 Follow us at X

New features

Support SFT/PPO/DPO/ORPO for the LLaVA-1.5 model by @BUAADreamer in #3450
Support inferring the LLaVA-1.5 model with both native Transformers and vLLM by @hiyouga in #3454
Support vLLM+LoRA inference for partial models (see support list)
Support 2x faster generation of the QLoRA model based on UnslothAI's optimization
Support adding new special tokens to the tokenizer via the new_special_tokens argument
Support choosing the device to merge LoRA in LlamaBoard via the export_device argument
Add a Colab notebook for getting into fine-tuning the Llama-3 model on a free T4 GPU
Automatically enable SDPA attention and fast tokenizer for higher performance

New models

Base models
- OLMo-1.7-7B
- Jamba-v0.1-51B
- Qwen1.5-110B
- DBRX-132B-Base
Instruct/Chat models
- Phi-3-mini-3.8B-instruct (4k/128k)
- LLaVA-1.5-7B
- LLaVA-1.5-13B
- Qwen1.5-110B-Chat
- DBRX-132B-Instruct

New datasets

Supervised fine-tuning datasets
- LLaVA mixed (en&zh) by @BUAADreamer in #3471
Preference datasets
- DPO mixed (en&zh) by @hiyouga

Bug fix

Fix #2093 #3333 #3347 #3374 #3387

v0.6.3

3 weeks ago

New features

Support Meta Llama-3 (8B/70B) models
Support UnslothAI's long-context QLoRA optimization (56,000 context length for Llama-2 7B in 24GB)
Support previewing local datasets in directories in LlamaBoard by @codemayq in #3291

New algorithms

Support BAdam algorithm by @Ledzy in #3287
Support Mixture-of-Depths training by @mlinmg in #3338

New models

Base models
- CodeGemma (2B/7B)
- CodeQwen1.5-7B
- Llama-3 (8B/70B)
- Mixtral-8x22B-v0.1
Instruct/Chat models
- CodeGemma-7B-it
- CodeQwen1.5-7B-Chat
- Llama-3-Instruct (8B/70B)
- Command R (35B) by @marko1616 in #3254
- Command R+ (104B) by @marko1616 in #3254
- Mixtral-8x22B-Instruct-v0.1

Bug fix

Fix full-tuning batch prediction examples by @khazic in #3261
Fix output_router_logits of Mixtral by @liu-zichen in #3276
Fix automodel from pretrained with attn implementation (see https://github.com/huggingface/transformers/issues/30298)
Fix unable to convergence issue in the layerwise galore optimizer (see https://github.com/huggingface/transformers/issues/30371)
Fix #3184 #3238 #3247 #3273 #3316 #3317 #3324 #3348 #3352 #3365 #3366

v0.6.2

1 month ago

New features

Support ORPO algorithm by @hiyouga in #3066
Support inferring BNB 4-bit models on multiple GPUs via the quantization_device_map argument
Reorganize README files, move example scripts to the examples folder
Support saving & loading arguments quickly in LlamaBoard by @hiyouga and @marko1616 in #3046
Support load alpaca-format dataset from the hub without dataset_info.json by specifying --dataset_dir ONLINE
Add a parameter moe_aux_loss_coef to control the coefficient of auxiliary loss in MoE models.

New models

Base models
- Breeze-7B-Base
- Qwen1.5-MoE-A2.7B (14B)
- Qwen1.5-32B
Instruct/Chat models
- Breeze-7B-Instruct
- Qwen1.5-MoE-A2.7B-Chat (14B)
- Qwen1.5-32B-Chat

Bug fix

Fix pile dataset download config by @lealaxy in #3053
Fix model generation config by @marko1616 in #3057
Fix qwen1.5 models DPO training by @changingivan and @hiyouga in #3083
Support Qwen1.5-32B by @sliderSun in #3160
Support Breeze-7B by @codemayq in #3161
Fix addtional_target in unsloth by @kno10 in #3201
Fix #2807 #3022 #3023 #3046 #3077 #3085 #3116 #3200 #3225

v0.6.1

1 month ago

This patch mainly fixes #2983

In commit 9bec3c98a22c91b1c28fda757db51eb780291641, we built the optimizer and scheduler inside the trainers, which inadvertently introduced a bug: when DeepSpeed was enabled, the trainers in transformers would build an optimizer and scheduler before calling the create_optimizer_and_scheduler method [1], then the optimizer created by our method would overwrite the original one, while the scheduler would not. Consequently, the scheduler would no longer affect the learning rate in the optimizer, leading to a regression in the training result. We have fixed this bug in 3bcd41b639899e72bcabc51d59bac8967af19899 and 8c77b1091296e204dc3c8c1f157c288ca5b236bd. Thank @HideLord for helping us identify this critical bug.

[1] https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/trainer.py#L1877-L1881

We have also fixed #2961 #2981 #2982 #2983 #2991 #3010

v0.6.0

1 month ago

We released our paper on arXiv! Thanks to all co-authors and AK's recommendation

New features

Support GaLore algorithm, allowing full-parameter learning of a 7B model using less than 24GB VRAM
Support FSDP+QLoRA that allows QLoRA fine-tuning of a 70B model on 2x24GB GPUs
Support LoRA+ algorithm for better LoRA fine-tuning by @qibaoyuan in #2830
LLaMA Factory 🤝 vLLM, enjoy 270% inference speed with --infer_backend vllm
Add Colab notebook for easily getting started
Support pushing fine-tuned models to Hugging Face Hub in web UI
Support apply_chat_template by adding a chat template to the tokenizer after fine-tuning
Add dockerize support by @S3Studio in #2743 #2849

New models

Base models
- OLMo (1B/7B)
- StarCoder2 (3B/7B/15B)
- Yi-9B
Instruct/Chat models
- OLMo-7B-Instruct

New datasets

Supervised fine-tuning datasets
- Cosmopedia (en)
Preference datasets
- Orca DPO (en)

Bug fix

Fix flash_attn in web UI by @cx2333-gt in #2730
Fix deepspeed runtime error in PPO by @stephen-nju in #2746
Fix readme ddp instruction by @khazic in #2903
Fix environment variable in datasets by @SirlyDreamer in #2905
Fix readme information by @0xez in #2919
Fix generation config validation by @marko1616 in #2945
Fix requirements by @rkinas in #2963
Fix bitsandbytes windows version by @Tsumugii24 in #2967
Fix #2346 #2642 #2649 #2732 #2735 #2756 #2766 #2775 #2777 #2782 #2798 #2802 #2803 #2817 #2895 #2928 #2936 #2941

v0.5.3

2 months ago

New features

Support DoRA (Weight-Decomposed LoRA)
Support QLoRA for the AWQ/AQLM quantized models, now 2-bit QLoRA is feasible
Provide some example scripts in https://github.com/hiyouga/LLaMA-Factory/tree/main/examples

New models

Base models
- Gemma (2B/7B)
Instruct/Chat models
- Gemma-it (2B/7B)

Bug fix

Add flash-attn package for Windows user by @codemayq in #2514
Fix ppo trainer #1163 by @stephen-nju in #2525
Support atom models by @Rayrtfr in #2531
Support role in webui by @lungothrin in #2575
Bump accelerate to 0.27.2 and fix #2552 by @Katehuuh in #2608
Fix #2512 #2516 #2532 #2533 #2629

v0.5.2

2 months ago

New features

Support block expansion in LLaMA Pro, see tests/llama_pro.py for usage
Add use_rslora option for the LoRA method

New models

Base models
- Qwen1.5 (0.5B/1.8B/4B/7B/14B/72B)
- DeepSeekMath-7B-Base
- DeepSeekCoder-7B-Base-v1.5
- Orion-14B-Base
Instruct/Chat models
- Qwen1.5-Chat (0.5B/1.8B/4B/7B/14B/72B)
- MiniCPM-2B-SFT/DPO
- DeepSeekMath-7B-Instruct
- DeepSeekCoder-7B-Instruct-v1.5
- Orion-14B-Chat
- Orion-14B-Long-Chat
- Orion-14B-RAG-Chat
- Orion-14B-Plugin-Chat

New datasets

Supervised fine-tuning datasets
- SlimOrca (en)
- Dolly (de)
- Dolphin (de)
- Airoboros (de)
Preference datasets
- Orca DPO (de)

Bug fix

Fix torch_dtype check in export model by @fenglui in #2262
Add Russian locale to LLaMA Board by @seoeaa in #2264
Remove manually set use_cache in export model by @yhyu13 in #2266
Fix DeepSpeed Zero3 training with MoE models by @A-Cepheus in #2283
Add a patch for full training of the Mixtral model using DeepSpeed Zero3 by @ftgreat in #2319
Fix bug in data pre-processing by @lxsyz in #2411
Add German sft and dpo datasets by @johannhartmann in #2423
Add version checking in test_toolcall.py by @mini-tiger in #2435
Enable parsing of SlimOrca dataset by @mnmueller in #2462
Add tags for models when pushing to hf hub by @younesbelkada in #2474
Fix #2189 #2268 #2282 #2320 #2338 #2376 #2388 #2394 #2397 #2404 #2412 #2420 #2421 #2436 #2438 #2471 #2481

v0.5.0

3 months ago

Congratulations on 10k stars 🎉 Make LLM fine-tuning easier and faster together with LLaMA-Factory ✨

New features

Support agent tuning for most models, you can fine-tune any LLMs with --dataset glaive_toolcall for tool using #2226
Support function calling in both API and Web mode with fine-tuned models, same as the OpenAI's format
LLaMA Factory 🤝 Unsloth, enjoy 170% LoRA training speed with --use_unsloth, see benchmarking here
Supports fine-tuning models on MPS device #2090

New models

Base models
- Phi-2 (2.7B)
- InternLM2 (7B/20B)
- SOLAR-10.7B
- DeepseekMoE-16B-Base
- XVERSE-65B-2
Instruct/Chat models
- InternLM2-Chat (7B/20B)
- SOLAR-10.7B-Instruct
- DeepseekMoE-16B-Chat
- Yuan (2B/51B/102B)

New datasets

Supervised fine-tuning datasets
- deepctrl dataset
- Glaive function calling dataset v2

Core updates

Refactor data engine: clearer dataset alignment, easier templating and tool formatting
Refactor saving logic for models with value head #1789
Use ruff code formatter for stylish code

Bug fix

Bump transformers version to 4.36.2 by @ShaneTian in #1932
Fix requirements by @dasdristanta13 in #2117
Add Machine-Mindset project by @JessyTsu1 in #2163
Fix typo in readme file by @junuMoon in #2194
Support resize token embeddings with ZeRO3 by @liu-zichen in #2201
Fix #1073 #1462 #1617 #1735 #1742 #1789 #1821 #1875 #1895 #1900 #1908 #1907 #1909 #1923 #2014 #2067 #2081 #2090 #2098 #2125 #2127 #2147 #2161 #2164 #2183 #2195 #2249 #2260

v0.4.0

4 months ago

🚨🚨 Core refactor

Deprecate checkpoint_dir and use adapter_name_or_path instead
Replace resume_lora_training with create_new_adapter
Move the patches in model loading to llmtuner.model.patcher
Bump to Transformers 4.36.1 to adapt to the Mixtral models
Wide adaptation for FlashAttention2 (LLaMA, Falcon, Mistral)
Temporarily disable LongLoRA due to breaking changes, which will be supported later

The above changes were made by @hiyouga in #1864

New features

Add DPO-ftx: mixing fine-tuning gradients to DPO via the dpo_ftx argument, suggested by @lylcst in https://github.com/hiyouga/LLaMA-Factory/issues/1347#issuecomment-1846943606
Integrate AutoGPTQ into the model export via the export_quantization_bit and export_quantization_dataset arguments
Support loading datasets from ModelScope Hub by @tastelikefeet and @wangxingjun778 in #1802
Support resizing token embeddings with the noisy mean initialization by @hiyouga in a66186b8724ffd0351a32593ab52d8a2312f339b
Support system column in both alpaca and sharegpt dataset formats

New models

Base models
- Mixtral-8x7B-v0.1
Instruct/Chat models
- Mixtral-8x7B-v0.1-instruct
- Mistral-7B-Instruct-v0.2
- XVERSE-65B-Chat
- Yi-6B-Chat

Bug fix

Improve logging for unknown arguments by @yhyu13 in #1868
Fix an overflow issue in LLaMA2 PPO training #1742
Fix #246 #1561 #1715 #1764 #1765 #1770 #1771 #1784 #1786 #1795 #1815 #1819 #1831

v0.3.3

5 months ago

New features

Support loading pre-trained models from ModelScope Hub by @tastelikefeet in #1700
Support launching a reward model server in demo API via specifying --stage=rm in api_demo.py
Support using a reward model server in PPO training via specifying --reward_model_type api
Support adjusting the shard size of exported models via the export_size argument

New models

Base models
- DeepseekLLM-Base (7B/67B)
- Qwen (1.8B/72B)
Instruct/Chat models
- DeepseekLLM-Chat (7B/67B)
- Qwen-Chat (1.8B/72B)
- Yi-34B-Chat

New datasets

Supervised fine-tuning datasets
- Nectar dataset by @mlinmg in #1689
Preference datasets
- Nectar dataset by @mlinmg in #1689

Bug fix

Improve get_current_device by @billvsme in #1690
Improve web UI preview by @Samge0 in #1695
Fix #1543 #1597 #1657 #1658 #1659 #1668 #1682 #1696 #1699 #1703 #1707 #1710