Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development. Amphion offers a unique feature: visualizations of classic models or architectures. We believe that these visualizations are beneficial for junior researchers and engineers who wish to gain a better understanding of the model.
The North-Star objective of Amphion is to offer a platform for studying the conversion of any inputs into audio. Amphion is designed to support individual generation tasks, including but not limited to,
In addition to the specific generation tasks, Amphion also includes several vocoders and evaluation metrics. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks.
Here is the Amphion v0.1 demo, whose voice, audio effects, and singing voice are generated by our models. Just enjoy it!
Amphion provides a comprehensive objective evaluation of the generated audio. The evaluation metrics contain:
Amphion unifies the data preprocess of the open-source datasets including AudioCaps, LibriTTS, LJSpeech, M4Singer, Opencpop, OpenSinger, SVCC, VCTK, and more. The supported dataset list can be seen here (updating).
Amphion provides visualization tools to interactively illustrate the internal processing mechanism of classic models. This provides an invaluable resource for educational purposes and for facilitating understandable research.
Currently, Amphion supports SingVisio, a visualization tool of the diffusion model for singing voice conversion.
Amphion can be installed through either Setup Installer or Docker Image.
git clone https://github.com/open-mmlab/Amphion.git
cd Amphion
# Install Python Environment
conda create --name amphion python=3.9.15
conda activate amphion
# Install Python Packages Dependencies
sh env.sh
Install Docker, NVIDIA Driver, NVIDIA Container Toolkit, and CUDA.
Run the following commands:
git clone https://github.com/open-mmlab/Amphion.git
cd Amphion
docker pull realamphion/amphion
docker run --runtime=nvidia --gpus all -it -v .:/app realamphion/amphion
Mount dataset by argument -v
is necessary when using Docker. Please refer to Mount dataset in Docker container and Docker Docs for more details.
We detail the instructions of different tasks in the following recipes:
We appreciate all contributions to improve Amphion. Please refer to CONTRIBUTING.md for the contributing guideline.
Amphion is under the MIT License. It is free for both research and commercial use cases.
@article{zhang2023amphion,
title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit},
author={Xueyao Zhang and Liumeng Xue and Yicheng Gu and Yuancheng Wang and Haorui He and Chaoren Wang and Xi Chen and Zihao Fang and Haopeng Chen and Junan Zhang and Tze Ying Tang and Lexiao Zou and Mingxuan Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu},
journal={arXiv},
year={2024},
volume={abs/2312.09911}
}