Use PEFT or Full-parameter to fine-tuning LLMs or MLLMs
ModelScope Community Website
中文   |   English  
SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by PEFT, we also provide a complete Adapters library to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.
To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners.
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
ruozhiba
in this documentation to begin training!swift export
to quantize models using AWQ/GPTQ and push to ModelScope Hub. See documentation: LLM Quantization.--train_dataset_mix_ratio 2.0
to enable training! We also open sourced the general knowledge dataset ms-bench.--merge_lora
parameter in AnimateDiff training.--deepspeed default-zero3
.swift web-ui
after installing ms-swift to start.freeze_parameters
parameter as a compromise between lora and full-parameter training. Corresponding sh can be found in full_freeze_ddp. Support disable_tqdm
, lazy_tokenize
, preprocess_num_proc
parameters, see command line arguments for details.use_flash_attn
parameter.Swift.prepare_model(model, NEFTuneConfig())
to enable.Usage with Swift CLI
section below for details.SWIFT runs in the Python environment. Please ensure your Python version is higher than 3.8.
# Full capabilities
pip install ms-swift[all] -U
# LLM only
pip install ms-swift[llm] -U
# AIGC only
pip install ms-swift[aigc] -U
# Adapters only
pip install ms-swift -U
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]
SWIFT depends on torch>=1.13, recommend torch>=2.0.0.
# China-Hangzhou image
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1
# US-west image
docker pull registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1
This section introduces basic usage, see the Documentation section for more ways to use.
swift web-ui
You can refer to the following scripts to customize your own training script.
Training Process | Training Method |
---|---|
Pretraining | Text Generation |
Fine-tuning | Single-turn/Multi-turn Agent Training/Self-cognition Multi-modal Vision/Multi-modal Speech |
Human Alignment | DPO |
Text-to-Image | DreamBooth, etc. |
Text-to-Video | - |
Start single GPU fine-tuning with the following command:
LoRA:
# Experimental Environment: A100
# GPU Memory Requirement: 20GB
# Runtime: 3.1 hours
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--eval_steps 200 \
Full-parameter:
# Experimental Environment: A100
# GPU Memory Requirement: 80GB
# Runtime: 2.5 hours
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type full \
--output_dir output \
--eval_steps 500 \
# Experimental Environment: 2 * A100
# GPU Memory Requirement: 10GB + 13GB
# Runtime: 3.4 hours
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 30GB
# Runtime: 0.8 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
Combining Model Parallelism and Data Parallelism:
# Experimental Environment: 4 * A100
# GPU Memory Requirement: 2*14GB + 2*18GB
# Runtime: 1.7 hours
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
ZeRO2:
# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 21GB
# Runtime: 0.9 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero2 \
ZeRO3:
# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 19GB
# Runtime: 3.2 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero3 \
Original model:
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
# use VLLM
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat \
--infer_backend vllm --max_model_len 8192
LoRA fine-tuned:
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
# use VLLM
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir xxx/checkpoint-xxx --load_dataset_config true \
--merge_lora true --infer_backend vllm --max_model_len 8192
CUDA_VISIBLE_DEVICES=0 swift eval --model_type qwen1half-7b-chat --eval_dataset mmlu ceval
Original model:
CUDA_VISIBLE_DEVICES=0 swift export --model_type qwen1half-7b-chat \
--quant_bits 4 --quant_method awq
LoRA fine-tuned:
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir xxx/checkpoint-xxx --load_dataset_config true \
--quant_method awq --quant_bits 4 \
--merge_lora true \
Original model:
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen1half-7b-chat
# 使用VLLM加速
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen1half-7b-chat \
--infer_backend vllm --max_model_len 8192
LoRA fine-tuned:
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir xxx/checkpoint-xxx
# 使用VLLM加速
CUDA_VISIBLE_DEVICES=0 swift deploy \
--ckpt_dir xxx/checkpoint-xxx --merge_lora true \
--infer_backend vllm --max_model_len 8192
Model Type | Model Introduction | Language | Model Size | Model Type |
---|---|---|---|---|
Qwen Qwen1.5 |
Tongyi Qwen 1.0 and 1.5 series models | Chinese English |
0.5B-72B including quantized versions |
base model chat model MoE model |
ChatGLM2 ChatGLM3 Codegeex2 |
Zhipu ChatGLM series models | Chinese English |
6B | base model chat model code model |
Baichuan/Baichuan2 | Baichuan 1 and Baichuan 2 | Chinese English |
7B-13B including quantized versions |
base model chat model |
Yuan2 | Langchao Yuan series models | Chinese English |
2B-102B | instruct model |
XVerse | XVerse series models | Chinese English |
7B-65B | base model chat model long text model MoE model |
LLaMA2 | LLaMA2 series models | English | 7B-70B including quantized versions |
base model chat model |
Mistral Mixtral |
Mistral series models | English | 7B | base model instruct model MoE model |
YI | 01AI's YI series models | Chinese English |
6B-34B | base model chat model long text model |
InternLM InternLM2 InternLM2-Math |
Pujiang AI Lab InternLM series models | Chinese English |
1.8B-20B | base model chat model math model |
DeepSeek DeepSeek-MoE DeepSeek-Coder DeepSeek-Math |
DeepSeek series models | Chinese English |
1.3B-67B | base model chat model MoE model code model math model |
MAMBA | MAMBA temporal convolution model | English | 130M-2.8B | base model |
Gemma | Google Gemma series models | English | 2B-7B | base model instruct model |
MiniCPM | OpenBmB MiniCPM series models | Chinese English |
2B-3B | chat model |
OpenBuddy | OpenBuddy series models | Chinese English |
7B-67B | base model chat model |
Orion | OrionStar AI series models | Chinese English |
14B | base model chat model |
BlueLM | VIVO BlueLM large model | Chinese English |
7B | base model chat model |
Ziya2 | Fengshenbang series models | Chinese English |
13B | base model chat model |
Skywork | Skywork series models | Chinese English |
13B | base model chat model |
Zephyr | Zephyr series models based on Mistral | English | 7B | chat model |
PolyLM | Tongyi Lab self-developed PolyLM series models | Multilingual | 13B | base model |
SeqGPT | Tongyi Lab self-developed text understanding model for information extraction and text classification | Chinese | 560M | semantic understanding model |
SUS | Southern University of Science and Technology model fine-tuned on YI | Chinese English |
34B | chat model |
Tongyi-Finance | Tongyi finance series models | Chinese English |
14B | base model chat model financial model |
CodeFuse-CodeLLaMA CodeFuse-Codegeex2 CodeFuse-Qwen |
Ant CodeFuse series models | Chinese English |
6B-34B | chat model code model |
phi2 | Microsoft's PHI2 model | English | 3B | base model code model |
Grok | X-ai | English | 300B | base model |
TeleChat | Tele-AI | Chinese English |
7B-12B | chat model |
dbrx | databricks | English | 132B | base model chat model |
mengzi3 | Langboat | Chinese English |
13B | base model |
c4ai-command-r | c4ai | Multilingual | 35B-104B | chat model |
Model Type | Model Introduction | Language | Model Size | Model Type |
---|---|---|---|---|
Qwen-VL | Tongyi Qwen vision model | Chinese English |
7B including quantized versions |
base model chat model |
Qwen-Audio | Tongyi Qwen speech model | Chinese English |
7B | base model chat model |
YI-VL | 01AI's YI series vision models | Chinese English |
6B-34B | chat model |
XComposer2 | Pujiang AI Lab InternLM vision model | Chinese English |
7B | chat model |
DeepSeek-VL | DeepSeek series vision models | Chinese English |
1.3B-7B | chat model |
MiniCPM-V | OpenBmB MiniCPM vision model | Chinese English |
3B | chat model |
CogVLM CogAgent |
Zhipu ChatGLM visual QA and Agent model | English | 17B-18B | chat model |
Llava | Llava series models | English | 7B | chat model |
Model Type | Model Introduction | Language | Model Type |
---|---|---|---|
AnimateDiff | AnimateDiff animation model | English | text-to-video |
SD1.5/SD2.0/SDXL | StabilityAI series diffusion models | English | text-to-image |
Dataset Type | Training Task | Documentation |
---|---|---|
General | Fine-tuning | 🔥ruozhiba, 🔥ms-bench, 🔥ms-bench-mini, 🔥alpaca-en(gpt4), 🔥alpaca-zh(gpt4), multi-alpaca-all, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, instruct-en, gpt4all-en, sharegpt-en, sharegpt-zh, tulu-v2-sft-mixture, wikipedia-zh, open-orca, open-orca-gpt4, sharegpt-gpt4, 🔥sharegpt-gpt4-mini. |
Agent | Fine-tuning | 🔥ms-agent, ms-agent-for-agentfabric-default, ms-agent-for-agentfabric-addition, damo-mini-agent-zh, damo-agent-zh, agent-instruct-all-en. |
General | Human Alignment | 🔥hh-rlhf-cn, stack-exchange-paired, hh-rlhf-harmless-base, hh-rlhf-helpful-base, hh-rlhf-helpful-online, hh-rlhf-helpful-rejection-sampled, hh-rlhf-red-team-attempts, hh-rlhf-cn-harmless-base-cn, hh-rlhf-cn-helpful-base-cn, hh-rlhf-cn-harmless-base-en, hh-rlhf-cn-helpful-base-en. |
Code | Fine-tuning | code-alpaca-en, 🔥leetcode-python-en, 🔥codefuse-python-en, 🔥codefuse-evol-instruction-zh. |
Medical | Fine-tuning | medical-en, medical-zh, medical-mini-zh, 🔥disc-med-sft-zh. |
Legal | Fine-tuning | lawyer-llama-zh, tigerbot-law-zh, 🔥disc-law-sft-zh. |
Math | Fine-tuning | 🔥blossom-math-zh, school-math-zh, open-platypus-en. |
SQL | Fine-tuning | text2sql-en, 🔥sql-create-context-en. |
Text Generation | Fine-tuning | 🔥advertise-gen-zh, 🔥dureader-robust-zh. |
Classification | Fine-tuning | cmnli-zh, 🔥cmnli-mini-zh, 🔥jd-sentiment-zh, 🔥hc3-zh, 🔥hc3-en. |
Quantization Assist | Quantization | pileval. |
Other | Fine-tuning | finance-en, poetry-zh, webnovel-zh, generated-chat-zh, cls-fudan-news-zh, ner-jave-zh. |
Vision | Fine-tuning | coco-en, 🔥coco-mini-en, coco-mini-en-2, capcha-images. |
Audio | Fine-tuning | aishell1-zh, 🔥aishell1-mini-zh. |
Technology Name |
---|
🔥LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS |
🔥LoRA+: LoRA+: Efficient Low Rank Adaptation of Large Models |
🔥LLaMA PRO: LLAMA PRO: Progressive LLaMA with Block Expansion |
🔥SCEdit: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing < arXiv | Project Page > |
🔥NEFTune: Noisy Embeddings Improve Instruction Finetuning |
QA-LoRA:Quantization-Aware Low-Rank Adaptation of Large Language Models |
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models |
ROME: Rank-One Editing of Encoder-Decoder Models |
Adapter: Parameter-Efficient Transfer Learning for NLP |
Prompt Tuning: Visual Prompt Tuning |
Side: Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks |
Res-Tuning: Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone < arXiv | Project Page | Usage > |
Tuners provided by PEFT, such as IA3, AdaLoRA, etc. |
Hardware Environment | Notes |
---|---|
CPU | |
RTX 20/30/40 series, etc. | After 30 series, BF16 and FlashAttn can be used |
Computing cards T4/V100, etc. | BF16 and FlashAttn not supported |
Computing cards A10/A100, etc. | Support BF16 and FlashAttn |
Huawei Ascend NPU |
make docs
# Check docs/build/html/index.html in web-browser
Document Name |
---|
Using Web-UI |
Using Tuners |
LLM Fine-tuning |
LLM Inference |
LLM Quantization |
LLM Deployment |
DPO Human Alignment Training |
AnimateDiff Training |
Document Name |
---|
Command Line Arguments |
Customizing New Models and Datasets |
Supported Models and Datasets List |
Runtime Speed and Memory Benchmark |
Best Practices Name |
---|
Agent Fine-Tuning Best Practice |
Self-Cognition Fine-Tuning Best Practice |
Qwen1.5 Best Practice |
Multi-Modal Model Training Best Practice |
This framework is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.
@Misc{swift,
title = {SWIFT:Scalable lightWeight Infrastructure for Fine-Tuning},
author = {The ModelScope Team},
howpublished = {\url{https://github.com/modelscope/swift}},
year = {2024}
}
You can contact us and communicate with us by adding our WeChat group: