Webui for using XTTS and for finetuning it
XTTS-Webui is a web interface that allows you to make the most of XTTS. There are other neural networks around this interface that will improve your results. You can also fine tune the model and get a high quality voice model.
Use this web UI through Google Colab
Please ensure you have Python 3.10.x or Python 3.11, CUDA 11.8 or CUDA 12.1 , Microsoft Builder Tools 2019 with c++ package, and ffmpeg installed
To get started:
To get started:
Follow these steps for installation:
Ensure that CUDA
is installed
Clone the repository: git clone https://github.com/daswer123/xtts-webui
Navigate into the directory: cd xtts-webui
Create a virtual environment: python -m venv venv
Activate the virtual environment:
venv\scripts\activate
source venv\bin\activate
Install PyTorch and torchaudio with pip command :
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
Install all dependencies from requirements.txt :
pip install -r requirements.txt
To launch the interface please follow these steps:
Activate your virtual environment:
venv/scripts/activate
or if you're on Linux,
source venv/bin/activate
Then start the webui for xtts by running this command:
python app.py
Here are some runtime arguments that can be used when starting the application:
Argument | Default Value | Description |
---|---|---|
-hs, --host | 127.0.0.1 | The host to bind to |
-p, --port | 8010 | The port number to listen on |
-d, --device | cuda | Which device to use (cpu or cuda) |
-sf,--speaker_folder | speakers/ | Directory containing TTS samples |
-o,--output | "output/" | Output directory |
-l,--language | "auto" | Webui language, you can see the available translations in the i18n/locale folder. |
-ms,--model-source | "local" | Define the model source: 'api' for latest version from repository, api inference or 'local' for using local inference and model v2.0.2 |
-v,-version | "v2.0.2" | You can specify which version of xtts to use. You can specify the name of the custom model for this purpose put the folder in models and specify the name of the folder in this flag |
--lowvram | Enable low vram mode which switches the model to RAM when not actively processing | |
--deepspeed | Enable deepspeed acceleration. Works on windows on python 3.10 and 3.11 | |
--share | Allows sharing of interface outside local computer | |
--rvc | Enable RVC post-processing, all models should locate in rvc folder |
Module for RVC, you can enable the RVC module to postprocess the received audio for this you need to add the --rvc flag if you are running in the console or write it to the startup file
In order for the model to work in RVC settings you need to select a model that you must first upload to the voice2voice/rvc folder, the model and index file must be together, the index file is optional, each model must be in a separate folder.