Tools to create your own voice dataset for TTS training
This repo outlines the steps and scripts necessary to create your own text-to-speech dataset for training a voice model. The final output is in LJSpeech format.
100|this is an example sentence
Run scripts/wavdurations2csv.sh to chart out sentence length and verify that you have a good distribution of WAV file lengths.
Cloud API access scopes
select Allow full access to all Cloud APIs
Create Conda Environment on GCP Instance
conda create -n tts python=3.7
conda activate tts
pip install google-cloud-texttospeech==2.1.0 tqdm pandas
100|this is an example sentence
python text_to_wav.py tts_generate
Run scripts/wavdurations2csv.sh to chart out sentence length and verify that you have a good distribution of WAV file lengths.
Cloud API access scopes
select Allow full access to all Cloud APIs
Create Conda Environment on GCP Instance
conda create -n stt python=3.7
conda activate stt
pip install google-cloud-speech tqdm pandas
In Adobe Audition, open audio file:
Diagnostics
-> Mark Audio
Mark the Speech
presetScan
Find Levels
Scan
againMark All
Or, in Audacity, open audio file:
Analyze
->Sound Finder
In Audition:
Markers
TabIn Audition:
In Audition:
Export Selected Markers to CSV
and save as Markers.csvPreferences
-> Media & Disk Cache
and Untick Save Peak Files
Export Audio of Selected Range Markers
with the following options:
Use marker names in filenames
WAV PCM
22050 Hz Mono, 16-bit
wavs_export
Or, in Audacity:
Export multiple...
wavs_export
Export labels
to Label Track.txt
For Audition, using the exported Markers.csv
and wavs folder run:
cd scripts
python wav_to_text.py audition
The script generates a new file, Markers_STT.csv
.
For Audacity, using the exported Label Track.txt
and wavs folder run:
cd scripts
python wav_to_text.py audacity
The script generates a new file, Label Track STT.csv
.
For Audition:
Import Markers from File
and select file with STT transcriptions: Markers_STT.csvFor Audacity:
Label Track STT.txt
in a text editor.For Audition:
Export Selected Markers to CSV
and save as Markers.csvExport Audio of Selected Range Markers
with the following options:
Use marker names in filenames
WAV PCM
22050 Hz Mono, 16-bit
wavs_export
For Audacity:
Export multiple...
wavs_export
Using the exported Markers.csv
(Audition) or Label Track STT.txt
(Audacity) and WAVs in wavs_export, scripts/markersfile_to_metadata.py will create a metadata.csv and folder of WAVs to train your TTS model:
For Audition:
python markersfile_to_metadata.py audition
For Audacity:
python markersfile_to_metadata.py audacity
Run scripts/wavdurations2csv.sh to chart out sentence length and verify that you have a good distribution of WAV file lengths.
ffmpeg: resampy: We tested three methods to upsample WAV files from 16,000 to 22,050 Hz. After reviewing the spectrograms, we selected ffmpeg for upsampling as it includes another 2 KHz of high end information when compared to resampy. scripts/resamplewav.sh
scripts/resamplewav.sh