Sooftware Speech Transformer Save

Transformer implementation speciaized in speech recognition tasks using Pytorch.

Project README

Speech-Transformer

PyTorch implementation of The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition.

Speech Transformer is a transformer framework specialized in speech recognition tasks.
This repository contains only model code, but you can train with speech transformer with this repository.
I appreciate any kind of feedback or contribution

Usage

Training

import torch
from speech_transformer import SpeechTransformer

BATCH_SIZE, SEQ_LENGTH, DIM, NUM_CLASSES = 3, 12345, 80, 4

cuda = torch.cuda.is_available()
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(BATCH_SIZE, SEQ_LENGTH, DIM).to(device)
input_lengths = torch.IntTensor([100, 50, 8])
targets = torch.LongTensor([[2, 3, 3, 3, 3, 3, 2, 2, 1, 0],
                            [2, 3, 3, 3, 3, 3, 2, 1, 2, 0],
                            [2, 3, 3, 3, 3, 3, 2, 2, 0, 1]]).to(device)  # 1 means <eos_token>
target_lengths = torch.IntTensor([10, 9, 8])

model = SpeechTransformer(num_classes=NUM_CLASSES, d_model=512, num_heads=8, input_dim=DIM)
predictions, logits = model(inputs, input_lengths, targets, target_lengths)

Beam Search Decoding

import torch
from speech_transformer import SpeechTransformer

BATCH_SIZE, SEQ_LENGTH, DIM, NUM_CLASSES = 3, 12345, 80, 10

cuda = torch.cuda.is_available()
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(BATCH_SIZE, SEQ_LENGTH, DIM).to(device)  # BxTxD
input_lengths = torch.LongTensor([SEQ_LENGTH, SEQ_LENGTH - 10, SEQ_LENGTH - 20]).to(device)

model = SpeechTransformer(num_classes=NUM_CLASSES, d_model=512, num_heads=8, input_dim=DIM)
model.set_beam_decoder(batch_size=BATCH_SIZE, beam_size=3)
predictions, _ = model(inputs, input_lengths)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on github or
contacts [email protected] please.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Reference

Author

Soohwan Kim @sooftware
Contacts: [email protected]

Open Source Agenda is not affiliated with "Sooftware Speech Transformer" Project. README Source: sooftware/speech-transformer

Stars

Open Issues

Last Commit

2 years ago

Repository

sooftware/speech-transformer

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/sooftware-speech-transformer"><img src="https://www.opensourceagenda.com/projects/sooftware-speech-transformer/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022