PaddlePaddle DeepSpeech Versions Save

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

r1.4.1

1 year ago

Others

r1.4.0

1 year ago

S2T

T2S

Server

Engine

Audio

Demos

Docs

Others

New Contributors

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.3.0...r1.4.0

r1.3.0

1 year ago

HighLIght

S2T

  • Support U2/U2++ Conformer dy2static, and U2/U2++ C++ High Performance Streaming ASR Deployment. @zh794390558
  • Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech. @Zth9730
  • Add Whisper CLI and Demos, support multi language recognition and translation. @zxcd
  • Add Wav2vec2 CLI and Demos, support ASR and Feature Extraction. @Zth9730
  • Add whisper. #2640 #2704 by @zxcd
  • Fix gpu training hang. #2478 by @Zth9730
  • Support u2++ based cli and server. #2489 #2510 by @Zth9730
  • Add wav2vec2-en. #2518 #2527 #2637 by @Zth9730
  • Add wav2vec2-zh cli. #2697 by @Zth9730

T2S

Audio

Demo

New Contributors

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.2.0...r1.3.0

r1.2.0

1 year ago

S2T

  • Fix conformer/transformer multi GPU training. #2327 #2334 #2336 #2372 by @Zth9730
  • Fix deepspeech2 decode_wav. #2351 by @Zth9730
  • Support BiTransformer decoder. #2415 by @Zth9730

T2S

  • Update VITS to support VITS and its voice cloning training on AISHELL-3. #2268 by @HighCWu
  • Add ERNIE-SAT synthesize_e2e. #2287 #2316 #2355 #2378 #2432 by @yt605155624
  • Specify the input data type of G2PW. #2288 by @kslz
  • Add TTS finetune example. #2297 #2385 #2418 #2430 by @lym0302
  • Fix Chinese English mixed TTS frontend. #2299 #2493 by @lym0302
  • Add words into polyphonic.yaml for g2pW. #2300 by @david-95
  • Update the quantifier unit in Text Normalization. #2308 by @pengzhendong
  • Fix Chinese frontend bugs. #2312 #2323 by @david-95
  • Add AISHELL-3 Voice Cloning with ECAPA-TDNN speaker encoder. #2359 #2429 by @yt605155624
  • Add pre-install doc for G2P and TN, update version of pypinyin. #2364 by @WongLaw
  • Add tools to compare two test results of G2P to show differences. #2367 by @david-95
  • Revise must_neural_tone_words. #2370 by @WongLaw
  • Add type-hint for g2pW. #2390 by @yt605155624
  • Replaced fixed path with path variable in MFA. #2416 by @WongLaw
  • Solve "unknown format: 3" for wavfile.write(). #2422 by @zhoupc2015

Text

  • Create preprocess.py for Punctuation Restoration. #2295 by @THUzyt21

Demo

  • Add Voice Cloning, TTS finetune, and ERNIE-SAT in speech_web. #2412 #2451 by @iftaken

Server

  • Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
  • Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw

Doc

  • Add Chinese doc and language switcher for metaverse, style_fs2 and story_talker. #2357 by @WongLaw
  • Update API docs. #2406 by @yt605155624
  • Add finetune demos in readthedocs. #2411 by @yt605155624

Test

  • Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy
  • Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo

Other

  • Format paddlespeech with pre-commit. #2331 by @yt605155624

Acknowledgements

Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat

New Contributors

  • @HighCWu made their first contribution in #2268
  • @pengzhendong made their first contribution in #2308
  • @Zth9730 made their first contribution in #2327
  • @WongLaw made their first contribution in #2357
  • @yuehuayingxueluo made their first contribution in #2376
  • @zhoupc2015 made their first contribution in #2422

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.1.0...r1.2.0

r1.1.0

1 year ago

S2T

TTS

Speechx

Audio

Server

CLI

Demo

Doc

Others

Acknowledgements

Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624

New Contributors

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0...r1.1.0

r1.0.0

1 year ago

Highlight

More

ASR

  • DeepSpeech2 streaming model aishell cer 6.66%
  • DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
  • Conformer aishell cer 4.64%
  • Conformer streaming model aishell cer 5.44%
  • Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)

Speechx

KWS

Audio

What's Changed

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0a...r1.0.0

r1.0.0a

1 year ago

Highlight

  • Release Streaming ASR and Streaming TTS system for industrial application.
  • Support KWS model
  • Deepspeech2 streaming model aishell cer 6.66%
  • Conformer aishell cer 4.64%
  • Conformer streaming model aishell cer 5.44%
  • SpeechX Deepspeech2 streaming with WFST

What's Changed

New Contributors

Acknowledgements

Special thanks to @zh794390558 @Honei @Jackwaterveg @lym0302 @qingen @GT-ZhangAcer @yt605155624 @WilliamZhang06 @SmileGoat @ccrrong

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r0.2.0...r1.0.0a

r0.2.0

2 years ago

S2T

  • Replace kaidi_fbank with paddleaudio #1612
  • Support CTC decoder online #821 #1626
  • Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577

TTS

Audio

  • Add paddleaudio.compliance modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518
  • Unittest and benchmark for audio feature APIs. #1548
  • [Audio] - [audio] refactor audio arch #1494 by @zh794390558
  • [Audio] - [audio] dtw metric #1493 by @zh794390558
  • [Audio] - [audio] fix complicance bug #1597 by @zh794390558

Deployment

  • [Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558
  • [Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat
  • [Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558
  • [Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558

server

  • [server] - [websocket] added online asr engine #1627 by @WilliamZhang06
  • [server] - [server] added engine type and asr inference #1475 by @WilliamZhang06
  • [server] - [Server] added asr engine #1413 by @WilliamZhang06
  • [server] - [Server] added engine factory and config #1399 by @WilliamZhang06
  • [server] - [server] added engine framework #1383 by @WilliamZhang06
  • [server] - [server] update readme #1604 by @lym0302
  • [server] - [server] add server cls #1554 by @lym0302
  • [server] - [server] add paddlespeech_server stats #1510 by @lym0302
  • [server] - [server] add cli #1466 by @lym0302
  • [server] - [server] add tts postprocess #1411 by @lym0302
  • [server] - [server] tts server #1386 by @lym0302

vector

  • [vector] - [vector] ecapa-tdnn on voxceleb #1523 by @Honei

CLI

  • Batch input supported. #1460
  • TTS: Add WaveRNN for CSMSC dataset.
  • TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
  • Vector: add speaker verification demo and doc #1605 by @Honei

Demo

  • [Demo] - [vec][search] update client image url #1628 by @qingen
  • [Demo] - [server] add server demo #1480 by @lym0302
  • [Demo] - [vec][search] add audio similarity search #1609 by @qingen

Acknowledgements

Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @Honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen

r0.1.2

2 years ago

Bug Fix:

  1. FIxed the version of librosa==0.8.1. Solve the compatibility issue caused by librosa upgrading. #1426

r0.1.1

2 years ago

New Features

CLI :

  • Add cli stats. #1274
  • Add unit test. #1321
  • ASR: Support English: Add transformer_libirspeech model. #1297
  • ASR: Support 4 decoding methods: ctc_greedy_search, ctc_beam_search, attention, attention_rescoring. #1297
  • ASR & ST: Use the unified config. #1305 / #1312
  • ASR: Refactor the code. #1260 by @AdamBear
  • TTS: Support long input text by default. #1241
  • TTS: Add Style MelGAN and HiFiGAN. #1241

ASR

  • Refactor configs in examples. #1225

TTS

  • Fix some frontend bugs. #1262 by @JiehangXie / #1310
  • Add speaker embedding and speaker id for style fastspeech2 inference. #1197 by @jerryuhoo
  • Add support for finetuning speedyspeech. #1302 by @jerryuhoo / #1322 / #1337
  • Update VCTK Parallel WaveGAN. #1294
  • Update Multi Band MelGAN. #1272

ST

  • Refactor configs in examples. #1225

Text

  • Refactor Punctuation Restoration example. #1215

Docs

  • Add topic note for releasing python packages
  • Add TTS papers. #1330
  • Add Frontend G2P topic. #1254

Others

  • Update released models and results. #1306

Acknowledgements

@zh794390558 @yt605155624 @Jackwaterveg @KPatr1ck @Mingxue-Xu @JiehangXie @grasswolfs @jerryuhoo @AdamBear @LittleChenCc @JamesLim-sy