LLaMA Efficient Tuning Versions Save

Unify Efficient Fine-Tuning of 100+ LLMs

v0.7.0

2 weeks ago

Congratulations on 20k stars 🎉 We are the 1st of the GitHub Trending at Apr. 23rd 🔥 Follow us at X

New features

  • Support SFT/PPO/DPO/ORPO for the LLaVA-1.5 model by @BUAADreamer in #3450
  • Support inferring the LLaVA-1.5 model with both native Transformers and vLLM by @hiyouga in #3454
  • Support vLLM+LoRA inference for partial models (see support list)
  • Support 2x faster generation of the QLoRA model based on UnslothAI's optimization
  • Support adding new special tokens to the tokenizer via the new_special_tokens argument
  • Support choosing the device to merge LoRA in LlamaBoard via the export_device argument
  • Add a Colab notebook for getting into fine-tuning the Llama-3 model on a free T4 GPU
  • Automatically enable SDPA attention and fast tokenizer for higher performance

New models

  • Base models
    • OLMo-1.7-7B
    • Jamba-v0.1-51B
    • Qwen1.5-110B
    • DBRX-132B-Base
  • Instruct/Chat models
    • Phi-3-mini-3.8B-instruct (4k/128k)
    • LLaVA-1.5-7B
    • LLaVA-1.5-13B
    • Qwen1.5-110B-Chat
    • DBRX-132B-Instruct

New datasets

  • Supervised fine-tuning datasets
    • LLaVA mixed (en&zh) by @BUAADreamer in #3471
  • Preference datasets
    • DPO mixed (en&zh) by @hiyouga

Bug fix

  • Fix #2093 #3333 #3347 #3374 #3387

v0.6.3

3 weeks ago

New features

  • Support Meta Llama-3 (8B/70B) models
  • Support UnslothAI's long-context QLoRA optimization (56,000 context length for Llama-2 7B in 24GB)
  • Support previewing local datasets in directories in LlamaBoard by @codemayq in #3291

New algorithms

New models

  • Base models
    • CodeGemma (2B/7B)
    • CodeQwen1.5-7B
    • Llama-3 (8B/70B)
    • Mixtral-8x22B-v0.1
  • Instruct/Chat models
    • CodeGemma-7B-it
    • CodeQwen1.5-7B-Chat
    • Llama-3-Instruct (8B/70B)
    • Command R (35B) by @marko1616 in #3254
    • Command R+ (104B) by @marko1616 in #3254
    • Mixtral-8x22B-Instruct-v0.1

Bug fix

v0.6.2

1 month ago

New features

  • Support ORPO algorithm by @hiyouga in #3066
  • Support inferring BNB 4-bit models on multiple GPUs via the quantization_device_map argument
  • Reorganize README files, move example scripts to the examples folder
  • Support saving & loading arguments quickly in LlamaBoard by @hiyouga and @marko1616 in #3046
  • Support load alpaca-format dataset from the hub without dataset_info.json by specifying --dataset_dir ONLINE
  • Add a parameter moe_aux_loss_coef to control the coefficient of auxiliary loss in MoE models.

New models

  • Base models
    • Breeze-7B-Base
    • Qwen1.5-MoE-A2.7B (14B)
    • Qwen1.5-32B
  • Instruct/Chat models
    • Breeze-7B-Instruct
    • Qwen1.5-MoE-A2.7B-Chat (14B)
    • Qwen1.5-32B-Chat

Bug fix

  • Fix pile dataset download config by @lealaxy in #3053
  • Fix model generation config by @marko1616 in #3057
  • Fix qwen1.5 models DPO training by @changingivan and @hiyouga in #3083
  • Support Qwen1.5-32B by @sliderSun in #3160
  • Support Breeze-7B by @codemayq in #3161
  • Fix addtional_target in unsloth by @kno10 in #3201
  • Fix #2807 #3022 #3023 #3046 #3077 #3085 #3116 #3200 #3225

v0.6.1

1 month ago

This patch mainly fixes #2983

In commit 9bec3c98a22c91b1c28fda757db51eb780291641, we built the optimizer and scheduler inside the trainers, which inadvertently introduced a bug: when DeepSpeed was enabled, the trainers in transformers would build an optimizer and scheduler before calling the create_optimizer_and_scheduler method [1], then the optimizer created by our method would overwrite the original one, while the scheduler would not. Consequently, the scheduler would no longer affect the learning rate in the optimizer, leading to a regression in the training result. We have fixed this bug in 3bcd41b639899e72bcabc51d59bac8967af19899 and 8c77b1091296e204dc3c8c1f157c288ca5b236bd. Thank @HideLord for helping us identify this critical bug.

[1] https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/trainer.py#L1877-L1881

We have also fixed #2961 #2981 #2982 #2983 #2991 #3010

v0.6.0

1 month ago

We released our paper on arXiv! Thanks to all co-authors and AK's recommendation

New features

  • Support GaLore algorithm, allowing full-parameter learning of a 7B model using less than 24GB VRAM
  • Support FSDP+QLoRA that allows QLoRA fine-tuning of a 70B model on 2x24GB GPUs
  • Support LoRA+ algorithm for better LoRA fine-tuning by @qibaoyuan in #2830
  • LLaMA Factory 🤝 vLLM, enjoy 270% inference speed with --infer_backend vllm
  • Add Colab notebook for easily getting started
  • Support pushing fine-tuned models to Hugging Face Hub in web UI
  • Support apply_chat_template by adding a chat template to the tokenizer after fine-tuning
  • Add dockerize support by @S3Studio in #2743 #2849

New models

  • Base models
    • OLMo (1B/7B)
    • StarCoder2 (3B/7B/15B)
    • Yi-9B
  • Instruct/Chat models
    • OLMo-7B-Instruct

New datasets

  • Supervised fine-tuning datasets
    • Cosmopedia (en)
  • Preference datasets
    • Orca DPO (en)

Bug fix

  • Fix flash_attn in web UI by @cx2333-gt in #2730
  • Fix deepspeed runtime error in PPO by @stephen-nju in #2746
  • Fix readme ddp instruction by @khazic in #2903
  • Fix environment variable in datasets by @SirlyDreamer in #2905
  • Fix readme information by @0xez in #2919
  • Fix generation config validation by @marko1616 in #2945
  • Fix requirements by @rkinas in #2963
  • Fix bitsandbytes windows version by @Tsumugii24 in #2967
  • Fix #2346 #2642 #2649 #2732 #2735 #2756 #2766 #2775 #2777 #2782 #2798 #2802 #2803 #2817 #2895 #2928 #2936 #2941

v0.5.3

2 months ago

New features

New models

  • Base models
    • Gemma (2B/7B)
  • Instruct/Chat models
    • Gemma-it (2B/7B)

Bug fix

  • Add flash-attn package for Windows user by @codemayq in #2514
  • Fix ppo trainer #1163 by @stephen-nju in #2525
  • Support atom models by @Rayrtfr in #2531
  • Support role in webui by @lungothrin in #2575
  • Bump accelerate to 0.27.2 and fix #2552 by @Katehuuh in #2608
  • Fix #2512 #2516 #2532 #2533 #2629

v0.5.2

2 months ago

New features

  • Support block expansion in LLaMA Pro, see tests/llama_pro.py for usage
  • Add use_rslora option for the LoRA method

New models

  • Base models
    • Qwen1.5 (0.5B/1.8B/4B/7B/14B/72B)
    • DeepSeekMath-7B-Base
    • DeepSeekCoder-7B-Base-v1.5
    • Orion-14B-Base
  • Instruct/Chat models
    • Qwen1.5-Chat (0.5B/1.8B/4B/7B/14B/72B)
    • MiniCPM-2B-SFT/DPO
    • DeepSeekMath-7B-Instruct
    • DeepSeekCoder-7B-Instruct-v1.5
    • Orion-14B-Chat
    • Orion-14B-Long-Chat
    • Orion-14B-RAG-Chat
    • Orion-14B-Plugin-Chat

New datasets

  • Supervised fine-tuning datasets
    • SlimOrca (en)
    • Dolly (de)
    • Dolphin (de)
    • Airoboros (de)
  • Preference datasets
    • Orca DPO (de)

Bug fix

  • Fix torch_dtype check in export model by @fenglui in #2262
  • Add Russian locale to LLaMA Board by @seoeaa in #2264
  • Remove manually set use_cache in export model by @yhyu13 in #2266
  • Fix DeepSpeed Zero3 training with MoE models by @A-Cepheus in #2283
  • Add a patch for full training of the Mixtral model using DeepSpeed Zero3 by @ftgreat in #2319
  • Fix bug in data pre-processing by @lxsyz in #2411
  • Add German sft and dpo datasets by @johannhartmann in #2423
  • Add version checking in test_toolcall.py by @mini-tiger in #2435
  • Enable parsing of SlimOrca dataset by @mnmueller in #2462
  • Add tags for models when pushing to hf hub by @younesbelkada in #2474
  • Fix #2189 #2268 #2282 #2320 #2338 #2376 #2388 #2394 #2397 #2404 #2412 #2420 #2421 #2436 #2438 #2471 #2481

v0.5.0

3 months ago

Congratulations on 10k stars 🎉 Make LLM fine-tuning easier and faster together with LLaMA-Factory ✨

New features

  • Support agent tuning for most models, you can fine-tune any LLMs with --dataset glaive_toolcall for tool using #2226
  • Support function calling in both API and Web mode with fine-tuned models, same as the OpenAI's format
  • LLaMA Factory 🤝 Unsloth, enjoy 170% LoRA training speed with --use_unsloth, see benchmarking here
  • Supports fine-tuning models on MPS device #2090

New models

  • Base models
    • Phi-2 (2.7B)
    • InternLM2 (7B/20B)
    • SOLAR-10.7B
    • DeepseekMoE-16B-Base
    • XVERSE-65B-2
  • Instruct/Chat models
    • InternLM2-Chat (7B/20B)
    • SOLAR-10.7B-Instruct
    • DeepseekMoE-16B-Chat
    • Yuan (2B/51B/102B)

New datasets

  • Supervised fine-tuning datasets
    • deepctrl dataset
    • Glaive function calling dataset v2

Core updates

  • Refactor data engine: clearer dataset alignment, easier templating and tool formatting
  • Refactor saving logic for models with value head #1789
  • Use ruff code formatter for stylish code

Bug fix

  • Bump transformers version to 4.36.2 by @ShaneTian in #1932
  • Fix requirements by @dasdristanta13 in #2117
  • Add Machine-Mindset project by @JessyTsu1 in #2163
  • Fix typo in readme file by @junuMoon in #2194
  • Support resize token embeddings with ZeRO3 by @liu-zichen in #2201
  • Fix #1073 #1462 #1617 #1735 #1742 #1789 #1821 #1875 #1895 #1900 #1908 #1907 #1909 #1923 #2014 #2067 #2081 #2090 #2098 #2125 #2127 #2147 #2161 #2164 #2183 #2195 #2249 #2260

v0.4.0

4 months ago

🚨🚨 Core refactor

  • Deprecate checkpoint_dir and use adapter_name_or_path instead
  • Replace resume_lora_training with create_new_adapter
  • Move the patches in model loading to llmtuner.model.patcher
  • Bump to Transformers 4.36.1 to adapt to the Mixtral models
  • Wide adaptation for FlashAttention2 (LLaMA, Falcon, Mistral)
  • Temporarily disable LongLoRA due to breaking changes, which will be supported later

The above changes were made by @hiyouga in #1864

New features

  • Add DPO-ftx: mixing fine-tuning gradients to DPO via the dpo_ftx argument, suggested by @lylcst in https://github.com/hiyouga/LLaMA-Factory/issues/1347#issuecomment-1846943606
  • Integrate AutoGPTQ into the model export via the export_quantization_bit and export_quantization_dataset arguments
  • Support loading datasets from ModelScope Hub by @tastelikefeet and @wangxingjun778 in #1802
  • Support resizing token embeddings with the noisy mean initialization by @hiyouga in a66186b8724ffd0351a32593ab52d8a2312f339b
  • Support system column in both alpaca and sharegpt dataset formats

New models

  • Base models
    • Mixtral-8x7B-v0.1
  • Instruct/Chat models
    • Mixtral-8x7B-v0.1-instruct
    • Mistral-7B-Instruct-v0.2
    • XVERSE-65B-Chat
    • Yi-6B-Chat

Bug fix

  • Improve logging for unknown arguments by @yhyu13 in #1868
  • Fix an overflow issue in LLaMA2 PPO training #1742
  • Fix #246 #1561 #1715 #1764 #1765 #1770 #1771 #1784 #1786 #1795 #1815 #1819 #1831

v0.3.3

5 months ago

New features

  • Support loading pre-trained models from ModelScope Hub by @tastelikefeet in #1700
  • Support launching a reward model server in demo API via specifying --stage=rm in api_demo.py
  • Support using a reward model server in PPO training via specifying --reward_model_type api
  • Support adjusting the shard size of exported models via the export_size argument

New models

  • Base models
    • DeepseekLLM-Base (7B/67B)
    • Qwen (1.8B/72B)
  • Instruct/Chat models
    • DeepseekLLM-Chat (7B/67B)
    • Qwen-Chat (1.8B/72B)
    • Yi-34B-Chat

New datasets

  • Supervised fine-tuning datasets
    • Nectar dataset by @mlinmg in #1689
  • Preference datasets
    • Nectar dataset by @mlinmg in #1689

Bug fix

  • Improve get_current_device by @billvsme in #1690
  • Improve web UI preview by @Samge0 in #1695
  • Fix #1543 #1597 #1657 #1658 #1659 #1668 #1682 #1696 #1699 #1703 #1707 #1710