Chinese Mixtral Versions Save

中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）

v1.2

1 month ago

本次更新添加了仿OpenAI API Demo。教程：https://github.com/ymcui/Chinese-Mixtral/wiki/openai_api_zh

This release adds OpenAI API Demo. Tutorial: https://github.com/ymcui/Chinese-Mixtral/wiki/openai_api_en

What's Changed

Add OpenAI API Demo by @ymcui in https://github.com/ymcui/Chinese-Mixtral/pull/25

Full Changelog: https://github.com/ymcui/Chinese-Mixtral/compare/v1.1...v1.2

v1.1

2 months ago

本次更新主要有以下两点：

添加中文Mixtral技术报告，介绍了模型训练方法和相关实验分析
- 论文地址：https://arxiv.org/abs/2403.01851
添加了预训练和指令精调训练脚本
- 预训练：https://github.com/ymcui/Chinese-Mixtral/wiki/pt_scripts_zh
- 指令精调：https://github.com/ymcui/Chinese-Mixtral/wiki/sft_scripts_zh

What's Changed

Add eval scripts by @iMountTai in https://github.com/ymcui/Chinese-Mixtral/pull/5
Update readme and add requirements by @iMountTai in https://github.com/ymcui/Chinese-Mixtral/pull/7
llama.cpp: add IQ3_XXS quantization models by @ymcui in https://github.com/ymcui/Chinese-Mixtral/pull/8
Add training scripts by @iMountTai in https://github.com/ymcui/Chinese-Mixtral/pull/18
Add Chinese Mixtral paper by @ymcui in https://github.com/ymcui/Chinese-Mixtral/pull/20

Full Changelog: https://github.com/ymcui/Chinese-Mixtral/compare/v1.0...v1.1

v1.0

3 months ago

发布中文Mixtral, Mixtral-Instruct大模型已正式发布。

Chinese-Mixtral：基座模型，使用20G语料增量训练
Chinese-Mixtral-Instruct：指令/chat模型，在Chinese-Mixtral的基础上进一步通过指令精调（500万条指令）获得

模型特点

📖 稀疏混合专家模型

Mixtral是一个稀疏混合专家模型。该模型与以往的LLaMA等主流大模型结构具有显著差异，主要体现在以下几点：

每个FFN层包含8个不同的"专家"（全连接层），根据门控值选取最优的2个进行激活
输入序列中的每个token都会独立地选取专家，而不是整个序列对应一组专家
实际参数量约为46.7B，在推理时激活的参数量约为13B

🚄 原生支持32K上下文（实测支持128K）

Mixtral模型原生支持32K上下文（实测可达128K）。用户可使用单一模型来解决不同长度的各类任务。

模型效果

大模型竞技场：http://llm-arena.ymcui.com/
生成效果：https://github.com/ymcui/Chinese-Mixtral#生成效果评测
客观效果：https://github.com/ymcui/Chinese-Mixtral#客观效果评测