TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
Download || Upgrading || Manual installation
Note: Not all models support all platforms. For example, MusicGen and AudioGen are not supported on MacOS as of yet.
audio__bark__continued_generation__2023-05-04_16-07-49_long.webm
audio__bark__continued_generation__2023-05-04_16-09-21_long.webm
audio__bark__continued_generation__2023-05-04_16-10-55_long.webm
https://rsxdalv.github.io/bark-speaker-directory/
https://github.com/rsxdalv/tts-generation-webui/discussions/186#discussioncomment-7291274
Apr 6:
Apr 5:
Apr 4:
Mar 28:
Mar 27:
Mar 26:
Mar 22:
Mar 16:
Mar 14:
Mar 13:
Mar 11:
Mar 10:
Mar 5:
Mar 3:
Feb 21:
Feb 8:
Feb 6:
Jan 21:
Jan 16:
Jan 15:
Jan 14:
Jan 13:
Jan 12:
Jan 11:
Jan 9:
Jan 8:
Oct 26:
Oct 24:
Sep 21:
docker pull ghcr.io/rsxdalv/tts-generation-webui:main
Sep 9:
Sep 5:
Aug 27:
Aug 26:
Aug 24:
Aug 21:
Aug 20:
Aug 18:
Aug 16:
Aug 15:
Aug 13:
Aug 11:
Aug 8:
Aug 7:
Aug 6:
Aug 4:
Aug 3:
Aug 2:
July 26:
July 24:
July 23:
July 21:
July 19:
July 16:
July 10:
July 9:
July 5:
July 2:
July 1:
Jun 29:
Jun 27:
Jun 20
Jun 19
June 18:
Jun 14:
June 5:
June 4:
June 3:
May 21:
May 17:
May 16:
May 13:
May 10:
May 4:
May 3:
May 2 Update 2:
May 2 Update 1:
Before:
In case of issues, feel free to contact the developers.
Not exactly, the dependencies clash, especially between conda and python (and dependencies are already in a critical state, moving them to conda is ways off). Therefore, while it might be possible to just replace the old installer with the new one and running the update, the problems are unpredictable and unfixable. Making an update to installer requires a lot of testing so it's not done lightly.
Install conda or another virtual environment
Highly recommended to use Python 3.10
Install git (conda install git
)
Install ffmpeg (conda install -y -c pytorch ffmpeg
)
Set up pytorch with CUDA or CPU (https://pytorch.org/audio/stable/build.windows.html#install-pytorch)
Clone the repo: git clone https://github.com/rsxdalv/tts-generation-webui.git
install the root requirements.txt with pip install -r requirements.txt
clone the repos in the ./models/ directory and install requirements under them
run using (venv) python server.py
Potentially needed to install build tools (without Visual Studio): https://visualstudio.microsoft.com/visual-cpp-build-tools/
npm install
npm run build
npm start
python server.py
or with start_(platform)
scripttts-generation-webui can also be ran inside of a Docker container. To get started, first build the Docker image while in the root directory:
docker build -t rsxdalv/tts-generation-webui .
Once the image has built it can be started with Docker Compose:
docker compose up -d
The container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:
docker logs tts-generation-webui
This project utilizes the following open source libraries:
suno-ai/bark - MIT License
tortoise-tts - Apache-2.0 License
ffmpeg - LGPL License
ffmpeg-python - Apache 2.0 License
audiocraft - MIT License
vocos - MIT License
RVC - MIT License
This technology is intended for enablement and creativity, not for harm.
By engaging with this AI model, you acknowledge and agree to abide by these guidelines, employing the AI model in a responsible, ethical, and legal manner.
The codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.
That being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.
Known non-permissive dependencies:
Library | License | Notes |
---|---|---|
encodec | CC BY-NC 4.0 | Newer versions are MIT, but need to be installed manually |
diffq | CC BY-NC 4.0 | Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs |
lameenc | GPL License | Future versions will make it LGPL, but need to be installed manually |
unidecode | GPL License | Not mission critical, can be replaced with another library, issue: https://github.com/neonbjb/tortoise-tts/issues/494 |
Model weights have different licenses, please pay attention to the license of the model you are using.
Most notably:
Audiocraft is currently only compatible with Linux and Windows. MacOS support still has not arrived, although it might be possible to install manually.
Due to the python package manager (pip) limitations, torch can get reinstalled several times. This is a wide ranging issue of pip and torch.
These messages:
---- requires ----, but you have ---- which is incompatible.
Are completely normal. It's both a limitation of pip and because this Web UI combines a lot of different AI projects together. Since the projects are not always compatible with each other, they will complain about the other projects being installed. This is normal and expected. And in the end, despite the warnings/errors the projects will work together. It's not clear if this situation will ever be resolvable, but that is the hope.
You can configure the interface through the "Settings" tab or, for advanced users, via the config.json file in the root directory (not recommended). Below is a detailed explanation of each setting:
Argument | Default Value | Description |
---|---|---|
text_use_gpu |
true |
Determines whether the GPU should be used for text processing. |
text_use_small |
true |
Determines whether a "small" or reduced version of the text model should be used. |
coarse_use_gpu |
true |
Determines whether the GPU should be used for "coarse" processing. |
coarse_use_small |
true |
Determines whether a "small" or reduced version of the "coarse" model should be used. |
fine_use_gpu |
true |
Determines whether the GPU should be used for "fine" processing. |
fine_use_small |
true |
Determines whether a "small" or reduced version of the "fine" model should be used. |
codec_use_gpu |
true |
Determines whether the GPU should be used for codec processing. |
load_models_on_startup |
false |
Determines whether the models should be loaded during application startup. |
Argument | Default Value | Description |
---|---|---|
inline |
false |
Display inline in an iframe. |
inbrowser |
true |
Automatically launch in a new tab. |
share |
false |
Create a publicly shareable link. |
debug |
false |
Block the main thread from running. |
enable_queue |
true |
Serve inference requests through a queue. |
max_threads |
40 |
Maximum number of total threads. |
auth |
null |
Username and password required to access interface, format: username:password . |
auth_message |
null |
HTML message provided on login page. |
prevent_thread_lock |
false |
Block the main thread while the server is running. |
show_error |
false |
Display errors in an alert modal. |
server_name |
0.0.0.0 |
Make app accessible on local network. |
server_port |
null |
Start Gradio app on this port. |
show_tips |
false |
Show tips about new Gradio features. |
height |
500 |
Height in pixels of the iframe element. |
width |
100% |
Width in pixels of the iframe element. |
favicon_path |
null |
Path to a file (.png, .gif, or .ico) to use as the favicon. |
ssl_keyfile |
null |
Path to a file to use as the private key file for a local server running on HTTPS. |
ssl_certfile |
null |
Path to a file to use as the signed certificate for HTTPS. |
ssl_keyfile_password |
null |
Password to use with the SSL certificate for HTTPS. |
ssl_verify |
true |
Skip certificate validation. |
quiet |
true |
Suppress most print statements. |
show_api |
true |
Show the API docs in the footer of the app. |
file_directories |
null |
List of directories that Gradio is allowed to serve files from. |
_frontend |
true |
Frontend. |