Sound Related Deep Learning Tasks boosting repository with pytorch
Pytorch Sound is a modeling toolkit that allows engineers to train custom models for sound related tasks. It focuses on removing repetitive patterns that builds deep learning pipelines to boost speed of related experiments.
import torch.nn as nn
from pytorch_sound.models import register_model, register_model_architecture
@register_model('my_model')
class Model(nn.Module):
...
@register_model_architecture('my_model', 'my_model_base')
def my_model_base():
return {'hidden_dim': 256}
from pytorch_sound.models import build_model
# build model
model_name = 'my_model_base'
model = build_model(model_name)
LibriTTS, Maestro, VCTK and VoiceBank are prepared at now.
Freely suggest me a dataset or PR is welcome!
import torch
from pytorch_sound.trainer import Trainer, LogType
class MyTrainer(Trainer):
def forward(self, input: torch.tensor, target: torch.tensor, is_logging: bool):
# forward model
out = self.model(input)
# calc your own loss
loss = calc_loss(out, target)
# build meta for logging
meta = {
'loss': (loss.item(), LogType.SCALAR),
'out': (out[0], LogType.PLOT)
}
return loss, meta
English handler sources are brought from https://github.com/keithito/tacotron
General sound settings and sources
$ sudo add-apt-repository ppa:jonathonf/ffmpeg-4
$ sudo apt update
$ sudo apt install ffmpeg
$ ffmpeg -version
$ pip install -e .
$ python pytorch_sound/scripts/preprocess.py [libri_tts / vctk / voice_bank] in_dir out_dir