Natural Language Processing Best Practices & Examples
In this release, we support both abstractive and extractive text summarization.
UniLM is a state of the art model developed by Microsoft Research Asia (MSRA). The model is pre-trained on a large unlabeled natural language corpus (English Wikipedia and BookCorpus) and can be fine-tuned on different types of labeled data for various NLP tasks like text classification and abstractive summarization.
unilm-large-cased
unilm-base-cased
For more info about UniLM, please refer to the following:
Thanks to the UniLM team, Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon, for their great work and support for the integration.
BERTSum is an encoder architecture designed for text summarization. It can be used together with different decoders to support both extractive and abstractive summarization.
bert-base-uncased
(extractive and abstractive)distilbert-base-uncased
(extractive)Papers:
GitHub:
Thanks to the original authors Yang Liu and Mirella Lapata for their great contribution.
All model implementations support distributed training and multi-GPU inferencing. For abstractive summarization, we also support mixed-precision training and inference.
This release integrates Hugging face transformers library.