FunASR Versions Save

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. ｜语音识别工具包，包含丰富的性能优越的开源预训练模型，支持语音识别、语音端点检测、文本后处理等，具备服务部署能力。

v0.3.0

1 year ago

What's new:

2023.3.17, funasr-0.3.0, modelscope-1.4.1

New Features:
- Added support for GPU runtime solution, nv-triton, which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
- Added support for CPU runtime quantization solution, which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted benchmark tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
- Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, demo.
- Added streaming inference pipeline to the 16k VAD model, 8k VAD model, with support for audio input streams (>= 10ms) , demo.
- Improved the punctuation prediction model, resulting in increased accuracy (F-score increased from 55.6 to 56.5).
- Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. Paraformer streaming model is used to output text in real time, while Paraformer-large offline model is used to correct recognition results, demo.
New Models:
- Added 16k Paraformer streaming model, which supports real-time speech recognition with streaming audio input, demo. It can be deployed using the gRPC service to implement real-time subtitle function.
- Added streaming punctuation model, which supports real-time punctuation marking in streaming speech recognition scenarios, with real-time calls based on VAD points. It can be used along with real-time ASR models to achieve readable real-time subtitle function, demo.
- Added TP-Aligner timestamp model, which takes audio and corresponding text as input and outputs word-level timestamps. Its performance is comparable to that of the Kaldi FA model (60.3ms vs. 69.3ms). It can be combined freely with ASR models, demo.
- Added financial domain model (8k Paraformer-large-3445vocab), which is fine-tuned using 1000 hours of data. The recognition accuracy on the financial domain test set increased by 5%, and the recall rate of domain keywords increased by 7%.
- Added audio-visual domain model (16k Paraformer-large-3445vocab), which is fine-tuned using 10,000 hours of data. The recognition accuracy on the audio-visual domain test set increased by 8%.
- Added 8k speaker verification model, which can be used for speaker embedding extraction.
- Added speaker diarization models, including 16k SOND Chinese model, 8k SOND English model, which achieved the best performance on AliMeeting and Callhome with a DER of 4.46% and 11.13%, respectively.
- Added UniASR streaming offline unifying models, including 16k UniASR Burmese, 16k UniASR Hebrew, 16k UniASR Urdu, 8k UniASR Mandarin financial domain, and 16k UniASR Mandarin audio-visual domain.

New Contributors

@dingbig made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/147
@yuekaizhang made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/161
@zhuzizyf made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/180
@znsoftm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/185
@songtaoshi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/227

Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.2.0...v0.3.0

v0.2.0

1 year ago

What's new:

2023.2.17, funasr-0.2.0, modelscope-1.3.0

We support a new feature, export paraformer models into onnx and torchscripts from modelscope. The local finetuned models are also supported.
We support a new feature, onnxruntime, you could deploy the runtime without modelscope or funasr, for the paraformer-large model, the rtf of onnxruntime is 3x speedup(0.110->0.038) on cpu, details.
We support a new feature, grpc, you could build the ASR service with grpc, by deploying the modelscope pipeline or onnxruntime.
We release a new model paraformer-large-contextual, which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
We optimize the timestamp alignment of Paraformer-large-long, the prediction accuracy of timestamp is much improved, and achieving accumulated average shift (aas) of 74.7ms, details.
We release a new model, 8k VAD model, which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in modelscope.
We release a new model, MFCCA, a multi-channel multi-speaker model which is independent of the number and geometry of microphones and supports Mandarin meeting transcription.
We release several new UniASR model: Southern Fujian Dialect model, French model, German model, Vietnamese model, Persian model.
We release a new model, paraformer-data2vec model, an unsupervised pretraining model on AISHELL-2, which is inited for paraformer model and then finetune on AISHEL-1.
We release a new feature, the VAD, ASR and PUNC models could be integrated freely, which could be models from modelscope, or the local finetine models. The demo.
We optimize punctuation common model, enhance the recall and precision, fix the badcases of missing punctuation marks.
Various new types of audio input types are now supported by modelscope inference pipeline, including: mp3、flac、ogg、opus...

New Contributors

@zjc6666 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/35
@lyblsgo made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/37
@lingyunfly made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/42
@fangd123 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/44
@dyyzhmm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/48
@R1ckShi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/50
@chenmengzheAAA made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/57
@ZhihaoDU made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/95
@SWHL made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/97
@yufan-aslp made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/105
@magicharry made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/119

Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.1.6...v0.2.0

v0.1.6

1 year ago

Release Notes:

2023.1.16, funasr-0.1.6

We release a new version model Paraformer-large-long, which integrate the VAD model, ASR, Punctuation model and timestamp together. The model could take in several hours long inputs.
We release a new type model, VAD, which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in Model Zoo.
We release a new type model, Punctuation, which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in Model Zoo.
We release a new model, Data2vec, an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
We release a new model, Paraformer-Tiny, a lightweight Paraformer model which supports Mandarin command words recognition.
We release a new type model, SV, which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
Various new types of audio input types are now supported by modelscope inference pipeline, including wav.scp, wav format, audio bytes, wave samples...

New Contributors

@nichongjia-2007 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/27

Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.1.4...v0.1.6

v0.1.4

1 year ago

The is the first release version.

Paraformer model could be decoding with batch >1.
UniASR model and recipes are new added.
Transformer and Conformer are also contained.
The inference and finetuning of models in modelscope are more convenience.

FunASR Versions Save

v0.3.0

What's new:

2023.3.17, funasr-0.3.0, modelscope-1.4.1

最新更新：

New Contributors

v0.2.0

What's new:

2023.2.17, funasr-0.2.0, modelscope-1.3.0

最新更新：

New Contributors

v0.1.6

Release Notes:

2023.1.16, funasr-0.1.6

最新更新

New Contributors

v0.1.4