Project README

Mergoo

mergoo is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.

🚀 Features

Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging
Flexible merging for each layer
Base Models supported : Llama, Mistral, and BERT
Trainers supported : 🤗 Trainer, SFTrainer, PEFT
Device Supported: CPU, MPS, GPU
Training choices: Only Router of MoE layers, or Fully fine-tuning of Merged LLM

If you like the project, consider leaving a ⭐️

Installation

Install by pip:

pip install mergoo

Install latest unstable version on Github:

pip install git+https://github.com/Leeroo-AI/mergoo

Install it from the source:

git clone https://github.com/Leeroo-AI/mergoo
cd mergoo
pip install -e .

Quick Start

Configuration Setup

Specify the config for merging:

model_type: type of base model. choices: mistral, llama, or bert.
num_experts_per_token: Number of experts for each token of MoE.
experts: config for experts to merge. includes expert_name and Hugging Face 🤗model_id.
router_layers: layers chosen for applying Mixture-of-Experts.

Fully Fine-tuned Experts

This is a sample config when merging fully fine-tuned LLM experts.

config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "experts": [
        {"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
        {"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
        {"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
    ],
    "router_layers": ["gate_proj", "up_proj", "down_proj"]
}

For the above example, we merged math and code mistral-based experts. Please refer to this notebook for further details!

Mixture of Adapters (MoE on LoRA)

This is a sample config when merging LoRA fine-tuned LLM experts. mergoo builds a routing layer on top of LoRAs, resulting in a mixture of adapters.

config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "base_model": "mistralai/Mistral-7B-v0.1",
    "experts": [
        {"expert_name": "adapter_1", "model_id": "predibase/customer_support"},
        {"expert_name": "adapter_2", "model_id": "predibase/customer_support_accounts"},
        {"expert_name": "adapter_3", "model_id": "predibase/customer_support_orders"},
        {"expert_name": "adapter_4", "model_id": "predibase/customer_support_payments"}
    ],
}

The expert_name starts with adapter instead of expert. Please refer to this notebook for further details!

Merge Experts

Following the config setup, mergoo creates the merged LLM as:

import torch
from mergoo.compose_experts import ComposeExperts

# create checkpoint
model_id = "data/mistral_lora_moe"
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)
expertmerger.compose()
expertmerger.save_checkpoint(model_id)

Load / Finetune Merged Expert

Now, you can easily train the merged LLM with Hugging Face Trainer:

from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM

model = MistralForCausalLM.from_pretrained("data/mistral_lora_moe") 
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them

trainer = Trainer( ... )
trainer.train()

📚 Learn More:

After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo.

Notebook	Details
MoE with fully fine-tuned LLM experts	Build a unifined Mixture-of-Experts model with fully fine-tuned experts. Inspired by BTX Research (Meta AI).
MoE with LoRA fine-tuned experts	Build a Mixture of Adaptes expert. Inspired by xlora \| Mixture-of-LoRAs \| MoLE \| PHATGOOSE \| MoELoRA
Hugging Face Blog	Deep dive into research details behind the merging methods of mergoo library

Mergoo Roadmap and Contributing

As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.

Here is mergoo roadmap:

Support MoE for Transformer Block
Compatibility with Huggingface 🤗
Support Trainer, SFTrainer
Loading Unified Checkpoint in BTX
Feature: Convertible QKV linear layers
Feature: Convertible FF linear layers
Feature: Routers only for a list of decoder layers indexes
Sharded Safetensor Saving
Support experts based on LLaMa and Mistral
Router Load balancing loss
Lazy loading of tensors for low memory usage in Merging
Support Mixture of LORA Experts (Mixture of Adapters)
Support other Layer-wise merging methods, including Mergekit
Support experts based on Gemma and Mamba
Support flash-attention
Support Mixture of Depths Transformer

Feel free to suggest new features and/or contribute to mergoo roadmap!

Join our community!

🚀 We love to here your feedback, please join Leeroo community:

Have a question not listed here? Open a GitHub Issue or send us an email!

Open Source Agenda is not affiliated with "Mergoo" Project. README Source: Leeroo-AI/mergoo

Stars

198

Open Issues

Last Commit

3 weeks ago

Repository

Leeroo-AI/mergoo

Homepage

https://www.leeroo.com/

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/mergoo"><img src="https://www.opensourceagenda.com/projects/mergoo/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog