PaddlePaddle DeepSpeech Versions Save

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

r1.4.1

1 year ago

Others

fix typeguard version. https://github.com/PaddlePaddle/PaddleSpeech/pull/3056 @yt605155624

r1.4.0

1 year ago

S2T

Add wav2vec2-zh finetune pipeline. https://github.com/PaddlePaddle/PaddleSpeech/pull/3012 https://github.com/PaddlePaddle/PaddleSpeech/pull/2916 by @zxcd
Fix some bugs in Whisper. https://github.com/PaddlePaddle/PaddleSpeech/pull/2900 https://github.com/PaddlePaddle/PaddleSpeech/pull/2828https://github.com/PaddlePaddle/PaddleSpeech/pull/2825 by @zxcd
Add code-switch asr tal_cs recipe. https://github.com/PaddlePaddle/PaddleSpeech/pull/2816 https://github.com/PaddlePaddle/PaddleSpeech/pull/2796 by @zxcd

T2S

Add dygraph to static、PaddleInference、Paddle2ONNX and ONNXRuntime Infer for Cantonese TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2990 by @JiehangXie
Add Cantonese test examples. https://github.com/PaddlePaddle/PaddleSpeech/pull/2937 by @JiehangXie
Add VITS inference pipeline. https://github.com/PaddlePaddle/PaddleSpeech/pull/3002 https://github.com/PaddlePaddle/PaddleSpeech/pull/2972 https://github.com/PaddlePaddle/PaddleSpeech/pull/2883 by @yt605155624
Rearrange encoder_infer param's order. https://github.com/PaddlePaddle/PaddleSpeech/pull/2983 by @443127316
Add male speaker and Chinese-English mix ONNXRuntime infer in CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2945 by @lym0302
Add Cantonese TTS example. https://github.com/PaddlePaddle/PaddleSpeech/pull/2950 https://github.com/PaddlePaddle/PaddleSpeech/pull/2927 https://github.com/PaddlePaddle/PaddleSpeech/pull/2924 https://github.com/PaddlePaddle/PaddleSpeech/pull/2907 https://github.com/PaddlePaddle/PaddleSpeech/pull/2899 by @WongLaw
Fix PWGAN TIPC. https://github.com/PaddlePaddle/PaddleSpeech/pull/2882 by @yt605155624
Add a case in not_erhua. https://github.com/PaddlePaddle/PaddleSpeech/pull/2863 by @QuanZ9
Fix data prepare for PaddleSlim PTQ of TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2862 by @yt605155624
Avoid using variable "attn_loss" before assignment. https://github.com/PaddlePaddle/PaddleSpeech/pull/2860 by @hopingZ
add soft link for shell in example, Add skip_copy_wave in norm stage of GANVocoders to save disk. https://github.com/PaddlePaddle/PaddleSpeech/pull/2851 by @yt605155624
Optimize the training of VITS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2843 https://github.com/PaddlePaddle/PaddleSpeech/pull/2809 https://github.com/PaddlePaddle/PaddleSpeech/pull/2791 https://github.com/PaddlePaddle/PaddleSpeech/pull/2770 by @WongLaw
Add StarGANv2-VC model scripts and synthsize scripts. https://github.com/PaddlePaddle/PaddleSpeech/pull/2842 by @yt605155624
Add diffusion module for training diffsinger. https://github.com/PaddlePaddle/PaddleSpeech/pull/2868 https://github.com/PaddlePaddle/PaddleSpeech/pull/2832 by @HighCWu
Fix some Text Frontend bugs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2831 by @yt605155624
For mixed Chinese and English speech synthesis, add SSML support for Chinese. https://github.com/PaddlePaddle/PaddleSpeech/pull/2830 by @jindongyi011039
Add mkldnn and trt config for TTS Inference. https://github.com/PaddlePaddle/PaddleSpeech/pull/2748 by @yt605155624
Fix dygraph to static for tacotron2. https://github.com/PaddlePaddle/PaddleSpeech/pull/2426 by @yt605155624

Server

Add static infer for multi-spk tts. https://github.com/PaddlePaddle/PaddleSpeech/pull/2779 by @lym0302

Engine

Add wfst decoder. https://github.com/PaddlePaddle/PaddleSpeech/pull/2886 by @SmileGoat
Add batch recognizer decode. https://github.com/PaddlePaddle/PaddleSpeech/pull/2866 by @SmileGoat
Add nnet prob cache && make 2 thread decode work. https://github.com/PaddlePaddle/PaddleSpeech/pull/2769 by @SmileGoat
Engine directory refactor. https://github.com/PaddlePaddle/PaddleSpeech/pull/2746 by @SmileGoat
Fix openfst download error. https://github.com/PaddlePaddle/PaddleSpeech/pull/2742 by @SmileGoat

Audio

Replace kaldi fbank with kaldi-native-fbank in paddleaudio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2799 by @SmileGoat
Fix load paddleaudio fail. https://github.com/PaddlePaddle/PaddleSpeech/pull/2815 by @SmileGoat
Update paddleaudio readme. https://github.com/PaddlePaddle/PaddleSpeech/pull/2801 by @SmileGoat

Demos

Add TTS ARM Linux C++ Demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2991 by @SwimmingTiger
Add Cantonese TTS in CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2977 by @WongLaw
Add ONNXRuntime infer for Cantonese TTS in CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2990 by @JiehangXie

Docs

Add u2pp_wenetspeech_static_quant to released_model.md. https://github.com/PaddlePaddle/PaddleSpeech/pull/2973 @zxcd
Remove redundant dependencies and Fix some bugs in setup.py. https://github.com/PaddlePaddle/PaddleSpeech/pull/2970 https://github.com/PaddlePaddle/PaddleSpeech/pull/2871 https://github.com/PaddlePaddle/PaddleSpeech/pull/2867 https://github.com/PaddlePaddle/PaddleSpeech/pull/2853 https://github.com/PaddlePaddle/PaddleSpeech/pull/2771 https://github.com/PaddlePaddle/PaddleSpeech/pull/2767 https://github.com/PaddlePaddle/PaddleSpeech/pull/2764 by @yt605155624

Others

Remove fluid API in ASR. https://github.com/PaddlePaddle/PaddleSpeech/pull/2944 https://github.com/PaddlePaddle/PaddleSpeech/pull/2859 https://github.com/PaddlePaddle/PaddleSpeech/pull/2852 by @zxcd
Add python simple adadelta optimizer. https://github.com/PaddlePaddle/PaddleSpeech/pull/2925 by @zxcd
Add encoding=utf-8 for text. https://github.com/PaddlePaddle/PaddleSpeech/pull/2896 by @zxcd https://github.com/PaddlePaddle/PaddleSpeech/pull/2865 by @yt605155624
Fix Tensor.numpy()[0] to float(Tensor) to adapt 0D. https://github.com/PaddlePaddle/PaddleSpeech/pull/2884 by @zhouwei25
Fix libsndfile.so not found in ubuntu18-cpu/Dockerfile. https://github.com/PaddlePaddle/PaddleSpeech/pull/2763 by @linkec
Fix AttributeError "module 'distutils' has no attribute 'ccompiler'" in setup.py in ctc_decoders. https://github.com/PaddlePaddle/PaddleSpeech/pull/2745 by @GreatV

New Contributors

@GreatV made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2745
@linkec made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2763
@cxumol made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2828
@jindongyi011039 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2830
@QuanZ9 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2863
@hopingZ made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2860
@zhouwei25 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2884
@EscaticZheng made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2915
@chinobing made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2922
@lance6716 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2924
@443127316 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2983

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.3.0...r1.4.0

r1.3.0

1 year ago

HighLIght

S2T

Support U2/U2++ Conformer dy2static, and U2/U2++ C++ High Performance Streaming ASR Deployment. @zh794390558
Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech. @Zth9730
Add Whisper CLI and Demos, support multi language recognition and translation. @zxcd
Add Wav2vec2 CLI and Demos, support ASR and Feature Extraction. @Zth9730
Add whisper. #2640 #2704 by @zxcd
Fix gpu training hang. #2478 by @Zth9730
Support u2++ based cli and server. #2489 #2510 by @Zth9730
Add wav2vec2-en. #2518 #2527 #2637 by @Zth9730
Add wav2vec2-zh cli. #2697 by @Zth9730

New Contributors

@ZapBird made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2484
@HexToString made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2528
@dahu1 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2554
@kFoodie made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2664
@zxcd made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2640
@michael-skynorth made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2666
@heyudage made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2688

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.2.0...r1.3.0

r1.2.0

1 year ago

S2T

Fix conformer/transformer multi GPU training. #2327 #2334 #2336 #2372 by @Zth9730
Fix deepspeech2 decode_wav. #2351 by @Zth9730
Support BiTransformer decoder. #2415 by @Zth9730

T2S

Update VITS to support VITS and its voice cloning training on AISHELL-3. #2268 by @HighCWu
Add ERNIE-SAT synthesize_e2e. #2287 #2316 #2355 #2378 #2432 by @yt605155624
Specify the input data type of G2PW. #2288 by @kslz
Add TTS finetune example. #2297 #2385 #2418 #2430 by @lym0302
Fix Chinese English mixed TTS frontend. #2299 #2493 by @lym0302
Add words into polyphonic.yaml for g2pW. #2300 by @david-95
Update the quantifier unit in Text Normalization. #2308 by @pengzhendong
Fix Chinese frontend bugs. #2312 #2323 by @david-95
Add AISHELL-3 Voice Cloning with ECAPA-TDNN speaker encoder. #2359 #2429 by @yt605155624
Add pre-install doc for G2P and TN, update version of pypinyin. #2364 by @WongLaw
Add tools to compare two test results of G2P to show differences. #2367 by @david-95
Revise must_neural_tone_words. #2370 by @WongLaw
Add type-hint for g2pW. #2390 by @yt605155624
Replaced fixed path with path variable in MFA. #2416 by @WongLaw
Solve "unknown format: 3" for wavfile.write(). #2422 by @zhoupc2015

Text

Create preprocess.py for Punctuation Restoration. #2295 by @THUzyt21

Demo

Add Voice Cloning, TTS finetune, and ERNIE-SAT in speech_web. #2412 #2451 by @iftaken

Server

Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw

Doc

Add Chinese doc and language switcher for metaverse, style_fs2 and story_talker. #2357 by @WongLaw
Update API docs. #2406 by @yt605155624
Add finetune demos in readthedocs. #2411 by @yt605155624

Test

Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy
Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo

Other

Format paddlespeech with pre-commit. #2331 by @yt605155624

Acknowledgements

Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat

New Contributors

@HighCWu made their first contribution in #2268
@pengzhendong made their first contribution in #2308
@Zth9730 made their first contribution in #2327
@WongLaw made their first contribution in #2357
@yuehuayingxueluo made their first contribution in #2376
@zhoupc2015 made their first contribution in #2422

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.1.0...r1.2.0

r1.1.0

1 year ago

S2T

Add wer tools. https://github.com/PaddlePaddle/PaddleSpeech/pull/1709
Add optimize attention cache used for attention ; 0-dim tensor for model export. https://github.com/PaddlePaddle/PaddleSpeech/pull/2124
Fix cnn cache dy2st shape. https://github.com/PaddlePaddle/PaddleSpeech/pull/2168

TTS

Fix random speaker embedding bug in voice clone. https://github.com/PaddlePaddle/PaddleSpeech/pull/1828 by @jerryuhoo
Add VITS model. https://github.com/PaddlePaddle/PaddleSpeech/pull/1855 https://github.com/PaddlePaddle/PaddleSpeech/pull/1957 https://github.com/PaddlePaddle/PaddleSpeech/pull/2040
Add kunlun support for speedyspeech. https://github.com/PaddlePaddle/PaddleSpeech/pull/1879 by @QingshuChen
Normalize wav max value to 1 in preprocess. https://github.com/PaddlePaddle/PaddleSpeech/pull/1887 by @jerryuhoo
Remove fluid dependence in TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/1940
Add onnx models for aishell3/ljspeech/vctk's tts3/voc1/voc5. https://github.com/PaddlePaddle/PaddleSpeech/pull/2068
Add TTS static/onnx models in pretrained_models.py. https://github.com/PaddlePaddle/PaddleSpeech/pull/2074
Add Ernie SAT model. https://github.com/PaddlePaddle/PaddleSpeech/pull/2052 https://github.com/PaddlePaddle/PaddleSpeech/pull/2117
Add Chinese English mixed TTS frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2143
Add Chinese English mixed TTS example. https://github.com/PaddlePaddle/PaddleSpeech/pull/2234
Fix English text frontend bug. https://github.com/PaddlePaddle/PaddleSpeech/pull/2235 by @david-95
Add g2pW to Chinese frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2230 by @BarryKCL
Fix text frontend bugs. https://github.com/PaddlePaddle/PaddleSpeech/pull/1912 https://github.com/PaddlePaddle/PaddleSpeech/pull/2250 https://github.com/PaddlePaddle/PaddleSpeech/pull/2254 https://github.com/PaddlePaddle/PaddleSpeech/pull/2255 https://github.com/PaddlePaddle/PaddleSpeech/pull/2272

Speechx

add custom asr script. https://github.com/PaddlePaddle/PaddleSpeech/pull/1946
refactor frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2003
deepspeech2 to onnx https://github.com/PaddlePaddle/PaddleSpeech/pull/2034
Refactor audio/data/feature cache. https://github.com/PaddlePaddle/PaddleSpeech/pull/1638
Frontend refactor . https://github.com/PaddlePaddle/PaddleSpeech/pull/1640
Fix nnet itf header. https://github.com/PaddlePaddle/PaddleSpeech/pull/1641
Refactor speech egs. https://github.com/PaddlePaddle/PaddleSpeech/pull/1707
Refactor egs and more egs for TLG wfst graph build. https://github.com/PaddlePaddle/PaddleSpeech/pull/1715
Speedup ngram building . https://github.com/PaddlePaddle/PaddleSpeech/pull/1729
Update speechx install doc. https://github.com/PaddlePaddle/PaddleSpeech/pull/1736
Fix nnet input and output name. https://github.com/PaddlePaddle/PaddleSpeech/pull/1740
Update wfst graph. https://github.com/PaddlePaddle/PaddleSpeech/pull/1742
Fix model params path name. https://github.com/PaddlePaddle/PaddleSpeech/pull/1750
Remove fluid tools for onnx export. https://github.com/PaddlePaddle/PaddleSpeech/pull/2116

Audio

Refactor paddleaudio to paddlespeech.audio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2007
Add webdataset in paddlespeech.audio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2062

Server

Remove extra logs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2111 https://github.com/PaddlePaddle/PaddleSpeech/pull/2113
Change streaming tts servers' fs from 24k to models' fs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2121
Fix bug in engine_warmup. https://github.com/PaddlePaddle/PaddleSpeech/pull/2171 by @Betterman-qs
Replace default vocoder in seerver to mb_melgan. https://github.com/PaddlePaddle/PaddleSpeech/pull/2214
Fix bug in streaming_asr_server with punctuation restoration. https://github.com/PaddlePaddle/PaddleSpeech/pull/2244
Rename time_s and time_ns to time_b and time_nb. https://github.com/PaddlePaddle/PaddleSpeech/pull/2133
More accuracy decoding somthing. https://github.com/PaddlePaddle/PaddleSpeech/pull/2128

CLI

Add paddlespeech.resource module. https://github.com/PaddlePaddle/PaddleSpeech/pull/1917
Dynamic cli commands registration. https://github.com/PaddlePaddle/PaddleSpeech/pull/1959
Fix unnecessary download. https://github.com/PaddlePaddle/PaddleSpeech/pull/2103
Remove extra logs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2084 https://github.com/PaddlePaddle/PaddleSpeech/pull/2085 https://github.com/PaddlePaddle/PaddleSpeech/pull/2107
Add Chinese English mixed TTS CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2249
Add onnxruntime infer for CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2222

Demo

Add speech web demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2039 https://github.com/PaddlePaddle/PaddleSpeech/pull/2080
Add kws cli and demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2063
Use paddle web for streaming asr. https://github.com/PaddlePaddle/PaddleSpeech/pull/2105
add custom asr script https://github.com/PaddlePaddle/PaddleSpeech/pull/1946
More cli for speech demos. https://github.com/PaddlePaddle/PaddleSpeech/pull/2138

Doc

Add API doc. https://github.com/PaddlePaddle/PaddleSpeech/pull/2075
Format tts doc string for read the docs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2115

Others

Fix CPU Dockerfile. https://github.com/PaddlePaddle/PaddleSpeech/pull/2172 by @BrightXiaoHan
Add PaddleSpeech Dockerfile for hard mode of installation. https://github.com/PaddlePaddle/PaddleSpeech/pull/2127 by @buchongyu2

Acknowledgements

Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624

New Contributors

@QingshuChen made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1879
@Zhangjingyu06 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1951
@ryanrussell made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1976
@freeliuzc made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2044
@vpegasus made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2043
@dependabot made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2061
@raycool made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2109
@YDX-2147483647 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2125
@chenkui164 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2130
@0x45f made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2162
@Doubledongli made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2167
@Betterman-qs made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2171
@BrightXiaoHan made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2172
@THUzyt21 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2202
@david-95 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2235
@BarryKCL made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2230

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0...r1.1.0

r1.0.0

1 year ago

Highlight

Release PP-ASR: Streaming ASR with timestamp and punctuation restoration, uses WenetSpeech Streaming Conformer and DeepSpeech2 ASR model.
Release PP-TTS: Streaming TTS system for industrial application.
Release PP-VPR: Industrial Voiceprint Recognition system and ECAPA-TDNN model.
Custom ASR apply reimbursement for transportation
Support MDTC KWS model

ASR

DeepSpeech2 streaming model aishell cer 6.66%
DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
Conformer aishell cer 4.64%
Conformer streaming model aishell cer 5.44%
Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)

Speechx

[SpeechX] DeepSpeech2 streaming with WFST in streaming asr example
[SpeechX] Add websocket websocket example
[SpeechX] custom asr, apply reimbursement for transportation demo

KWS

[KWS] Add kws example on HeySnips dataset. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1558
[KWS] Update KWS example. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1783

Audio

[Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1758
[Audio] Fix mcd issue. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1658
[Audio] Remove mcd. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1659
[Audio] Add VoxCeleb dataset for speaker recognition.
[Audio] Add HeySnips dataset for keyword spotting.

What's Changed

[R1.0][asr][server]add vector server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1845
[R1.0][asr][server]join streaming asr and punc server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1846
[R1.0]asr streaming server add time stamp by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1850
[R1.0][tts][server] update readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1852
[R1.0] update cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1854
[r1.0] update version to r1.0.0 by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1857
[R1.0] Add doc for wenetspeech model (ds2 online, conformer online) by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1862
[R1.0][server] improve server code by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1866
[R1.0][asr][server]update the streaming asr readme by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1871
[R1.0] Updata released model info ( Wenetspeech ds2 online, conformer online) by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1869
[R1.0]fix server doc and decode_method by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1889
[speechx] add custom_streaming_asr @SmileGoat #1891
[speechx] speedup ngram building @zh794390558 #1729
[speechx] refactor egs and more egs for TLG wfst graph build @zh794390558 #1715
[speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn @SmileGoat #1676
[speechx] Add websocket & make it work @SmileGoat #1720
[speechx] Frontend refactor @SmileGoat #1640
[Speechx] add tlg decoder @SmileGoat #1599

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0a...r1.0.0

r1.0.0a

1 year ago

Highlight

Release Streaming ASR and Streaming TTS system for industrial application.
Support KWS model
Deepspeech2 streaming model aishell cer 6.66%
Conformer aishell cer 4.64%
Conformer streaming model aishell cer 5.44%
SpeechX Deepspeech2 streaming with WFST

What's Changed

[speechx] refactor audio/data/feature cache by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1638
[speechx] Frontend refactor by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1640
[speechx] fix nnet itf header by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1641
[TTS]add license and reference for some models by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1642
[Doc] supplement note by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1643
[vec][search] update search demo README by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1644
[speechx]refactor linear feature:unify vector & remove redundant function & add remained_wav cache shift wav by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1649
[Audio] Fix mcd issue. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1658
[Audio] Remove mcd. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1659
[vec]update the speaker verification model by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1663
[ASR] update ds2 online model by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1668
[TTS]fix preprocess bug, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1660
update README, test=doc by @iftaken in https://github.com/PaddlePaddle/PaddleSpeech/pull/1672
[Punc] Update RESULTS.md. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1675
[CLI] update ds2 online model in cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1674
[CLI] ASR: Add duration limitation for asr by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1666
[vec]add speaker verification score method by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1646
[TTS]add onnx inference for fastspeech2 + hifigan/mb_melgan by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1665
[doc]update readme by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1680
[WebSocket] fixed online model md5 error , test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1682
[speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1676
[server] add stream tts server by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1652
[speechx]remove mutable in audio_cache by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1687
[Doc] update readem for aishell/asr0 by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1677
[vec] add speaker diarization pipeline by @ccrrong in https://github.com/PaddlePaddle/PaddleSpeech/pull/1651
[vec]voxceleb convert dataset format to paddlespeech by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1630
[Speechx] add tlg decoder by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1599
[vec]add vector necessary note, test=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1690
Revert "[WebSocket] fixed online model md5 error , test=doc" by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1691
[WebSocket] added online web client, test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1692
修复 example/aishell 目录中speech单词拼写错误问题 by @buchongyu2 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1694
修改hack 单词拼写错误 by @buchongyu2 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1697
[TTS]change NLC to NCL in speedyspeech, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1693
[doc]fix typo, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1698
[doc]add pwgan onnx model, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1700
[WebSocket] added online asr doc and online asr command line, test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1701
[vec][server] vpr demo support by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1696
[speechx] refactor speech egs by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1707
[asr]add wer tools by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1709
[asr][websocket]fix the ws send bug, cache buffer, text=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1710
[TTS]add fastspeech2 cnndecoder onnx model by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1712
[speechx] refactor egs and more egs for TLG wfst graph build by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1715
[vec][score] add plda model by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1681
[CLI]update cli, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1716
[server] add streaming am infer by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1713
[speechx] Add websocket & make it work by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1720
[asr][websocket] add asr conformer websocket server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1704
[vec][loss] add NCE Loss from RNNLM by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1719
[vec][loss] add FocalLoss to deal with class imbalances by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1722
[TTS]restructure syn_utils.py, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1723
[TTS]add paddle device set for ort and inference by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1727
[vec] add GRL to domain adaptation by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1725
[speechx] speedup ngram building by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1729
[asr] Add new cer tools by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1673
[speechx]add websocket lib by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1732
[speechx]update speechx install doc by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1736
[Doc] prefect the packing scripts by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1735
[Doc]renew the released mode by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1739
[asr][websocket]add streaming asr demo by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1737
[speechx] fix nnet input and output name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1740
[ASR] remove redundant log by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1741
[speechx] update wfst graph by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1742
[speechx] Add recognizer_test_main script by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1743
[vec][doc]update the voxceleb readme.md, test=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1744
[ASR] fix CER tools by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1747
[Doc] Fix release_model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1746
[Doc] Updata released model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1748
Updata released model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1749
[speechx] fix model params path name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1750
[speechx] fix linear-spectrogram-wo-db-norm-ol read feature issue by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1751
[TTS]fix wavernn white noise bug for paddle develop(2.3) by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1752
[server] add onnx tts engine by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1733
[TTS]Update paddle2onnx by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1754
[Setup] to r1.0.0a by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1759
[audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1758
[speechx] to_float32, fix shell script by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1757
[vec] bug fix to adapt VUE by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1760
[asr][weboscket]fix the streaming asr server bug, server client by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1761
[speechx] fbank and mfcc by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1765
format code by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1764
[CLI] Add conformer_aishell, conformer_online_aishell by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1767
[speechx]make cmvn global in run.sh by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1768
[ASR] ds2: add log_interval and fix lr problem when resume training by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1766
[speechx] set nnet param by flags by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1769
[server] add streaming tts demos by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1771
[server] fix tts streaming server by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1774
[KWS]Add kws example on HeySnips dataset. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1558
[text][server]add text punc server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1772
[ASR] fix asr cli infer by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1770
[vec] add GE2E to support unlabeled data training by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1731
[ASR] fix time restricion in test_cli.sh by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1777
[ASR] Replace fbank by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1776
[CLI] add color for test_cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1778
[speechx] add sucess log in run.sh by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1779
[KWS]Update KWS example. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1783
[server] update readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1782
[Doc] Update ds2online model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1781
[CLI] renew ds2 online model by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1786
[speechx] fix speechx ws server to return dummpy partial result by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1787
[asr][server]asr client add punctuatjion server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1784
[asr] patch func to var by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1788
[asr][server]fix client parse the asr result bug by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1789
[Bug fix] fix test_cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1794
[vec] update readme by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1796
[R1.0]update the streaming output and punc default ip, port by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1800
Renew ds2 online model [cer 6.66%] by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1802
[R1.0] update the streaming asr server readme by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1810
[R1.0] Renew ds2 online doc info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1809
[server] update streaming demos readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1806
[R1.0]update the paddlespeech_client asr_online cli by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1818
[r1.0][doc] fix readme by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1825

New Contributors

@iftaken made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1672
@ccrrong made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1651
@buchongyu2 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1694

Acknowledgements

Special thanks to @zh794390558 @Honei @Jackwaterveg @lym0302 @qingen @GT-ZhangAcer @yt605155624 @WilliamZhang06 @SmileGoat @ccrrong

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r0.2.0...r1.0.0a

r0.2.0

2 years ago

S2T

Replace kaidi_fbank with paddleaudio #1612
Support CTC decoder online #821 #1626
Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577

TTS

Add SpeedySpeech multi-speaker support for synthesize_e2e.py. https://github.com/PaddlePaddle/PaddleSpeech/pull/1370 by @jerryuhoo
Add WaveRNN for CSMSC dataset. https://github.com/PaddlePaddle/PaddleSpeech/pull/1379
Add Tacotron2 for CSMSC / LJSpeech datasets. https://github.com/PaddlePaddle/PaddleSpeech/pull/1314 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1416
Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. https://github.com/PaddlePaddle/PaddleSpeech/pull/1419
Update text frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/1506
Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. https://github.com/PaddlePaddle/PaddleSpeech/pull/1549 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1581 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1587
Add NPU support for TransformerTTS. #1593 by @windstamp
Add CNN Decoder for Streaming Fastspeech2. https://github.com/PaddlePaddle/PaddleSpeech/pull/1634

Audio

Add paddleaudio.compliance modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518
Unittest and benchmark for audio feature APIs. #1548
[Audio] - [audio] refactor audio arch #1494 by @zh794390558
[Audio] - [audio] dtw metric #1493 by @zh794390558
[Audio] - [audio] fix complicance bug #1597 by @zh794390558

Deployment

[Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558
[Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat
[Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558
[Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558

server

[server] - [websocket] added online asr engine #1627 by @WilliamZhang06
[server] - [server] added engine type and asr inference #1475 by @WilliamZhang06
[server] - [Server] added asr engine #1413 by @WilliamZhang06
[server] - [Server] added engine factory and config #1399 by @WilliamZhang06
[server] - [server] added engine framework #1383 by @WilliamZhang06
[server] - [server] update readme #1604 by @lym0302
[server] - [server] add server cls #1554 by @lym0302
[server] - [server] add paddlespeech_server stats #1510 by @lym0302
[server] - [server] add cli #1466 by @lym0302
[server] - [server] add tts postprocess #1411 by @lym0302
[server] - [server] tts server #1386 by @lym0302

vector

[vector] - [vector] ecapa-tdnn on voxceleb #1523 by @Honei

CLI

Batch input supported. #1460
TTS: Add WaveRNN for CSMSC dataset.
TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
Vector: add speaker verification demo and doc #1605 by @Honei

Demo

[Demo] - [vec][search] update client image url #1628 by @qingen
[Demo] - [server] add server demo #1480 by @lym0302
[Demo] - [vec][search] add audio similarity search #1609 by @qingen

Acknowledgements

Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @Honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen

r0.1.2

2 years ago

Bug Fix:

FIxed the version of librosa==0.8.1. Solve the compatibility issue caused by librosa upgrading. #1426

r0.1.1

2 years ago

New Features

CLI :

Add cli stats. #1274
Add unit test. #1321
ASR: Support English: Add transformer_libirspeech model. #1297
ASR: Support 4 decoding methods: ctc_greedy_search, ctc_beam_search, attention, attention_rescoring. #1297
ASR & ST: Use the unified config. #1305 / #1312
ASR: Refactor the code. #1260 by @AdamBear
TTS: Support long input text by default. #1241
TTS: Add Style MelGAN and HiFiGAN. #1241

ASR

Refactor configs in examples. #1225

TTS

Fix some frontend bugs. #1262 by @JiehangXie / #1310
Add speaker embedding and speaker id for style fastspeech2 inference. #1197 by @jerryuhoo
Add support for finetuning speedyspeech. #1302 by @jerryuhoo / #1322 / #1337
Update VCTK Parallel WaveGAN. #1294
Update Multi Band MelGAN. #1272

ST

Refactor configs in examples. #1225

Text

Refactor Punctuation Restoration example. #1215

Docs

Add topic note for releasing python packages
Add TTS papers. #1330
Add Frontend G2P topic. #1254

Others

Update released models and results. #1306

Acknowledgements

@zh794390558 @yt605155624 @Jackwaterveg @KPatr1ck @Mingxue-Xu @JiehangXie @grasswolfs @jerryuhoo @AdamBear @LittleChenCc @JamesLim-sy

PaddlePaddle DeepSpeech Versions Save

r1.4.1

Others

r1.4.0

S2T

T2S

Server

Engine

Audio

Demos

Docs

Others

New Contributors

r1.3.0

HighLIght

S2T

T2S

Audio

Demo

New Contributors

r1.2.0

S2T

T2S

Text

Demo

Server

Doc

Test

Other

Acknowledgements

New Contributors

r1.1.0

S2T

TTS

Speechx

Audio

Server

CLI

Demo

Doc

Others

Acknowledgements

New Contributors

r1.0.0

Highlight

More

ASR

Speechx

KWS

Audio

What's Changed

r1.0.0a

Highlight

What's Changed

New Contributors

Acknowledgements

r0.2.0

S2T

TTS

Audio

Deployment

server

vector

CLI

Demo

Acknowledgements

r0.1.2

Bug Fix:

r0.1.1

New Features

CLI :

ASR

TTS

ST

Text

Docs

Others

Acknowledgements