awesome-huggingface
This is a list of some wonderful open-source projects & applications integrated with Hugging Face libraries.
How to contribute
đ¤ Official Libraries
First-party cool stuff made with â¤ī¸ by đ¤ Hugging Face.
-
transformers - State-of-the-art natural language processing for Jax, PyTorch and TensorFlow.
-
datasets - The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools.
-
tokenizers - Fast state-of-the-Art tokenizers optimized for research and production.
-
knockknock - Get notified when your training ends with only two additional lines of code.
-
accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
-
autonlp - Train state-of-the-art natural language processing models and deploy them in a scalable environment automatically.
-
nn_pruning - Prune a model while finetuning or training.
-
huggingface_hub - Client library to download and publish models and other files on the huggingface.co hub.
-
tune - A benchmark for comparing Transformer-based models.
đŠâđĢ Tutorials
Learn how to use Hugging Face toolkits, step-by-step.
-
Official Course (from Hugging Face) - The official course series provided by đ¤ Hugging Face.
-
transformers-tutorials (by @nielsrogge) - Tutorials for applying multiple models on real-world datasets.
NLP toolkits built upon Transformers. Swiss Army!
-
AllenNLP (from AI2) - An open-source NLP research library.
-
Graph4NLP - Enabling easy use of Graph Neural Networks for NLP.
-
Lightning Transformers - Transformers with PyTorch Lightning interface.
-
Adapter Transformers - Extension to the Transformers library, integrating adapters into state-of-the-art language models.
-
Obsei - A low-code AI workflow automation tool and performs various NLP tasks in the workflow pipeline.
-
Trapper (from OBSS) - State-of-the-art NLP through transformer models in a modular design and consistent APIs.
-
Flair - A very simple framework for state-of-the-art NLP.
đĨĄ Text Representation
Converting a sentence to a vector.
-
Sentence Transformers (from UKPLab) - Widely used encoders computing dense vector representations for sentences, paragraphs, and images.
-
WhiteningBERT (from Microsoft) - An easy unsupervised sentence embedding approach with whitening.
-
SimCSE (from Princeton) - State-of-the-art sentence embedding with contrastive learning.
-
DensePhrases (from Princeton) - Learning dense representations of phrases at scale.
âī¸ Inference Engines
Highly optimized inference engines implementing Transformers-compatible APIs.
-
TurboTransformers (from Tencent) - An inference engine for transformers with fast C++ API.
-
FasterTransformer (from Nvidia) - A script and recipe to run the highly optimized transformer-based encoder and decoder component on NVIDIA GPUs.
-
lightseq (from ByteDance) - A high performance inference library for sequence processing and generation implemented in CUDA.
-
FastSeq (from Microsoft) - Efficient implementation of popular sequence models (e.g., Bart, ProphetNet) for text generation, summarization, translation tasks etc.
đ Model Scalability
Parallelization models across multiple GPUs.
-
Parallelformers (from TUNiB) - A library for model parallel deployment.
-
OSLO (from TUNiB) - A library that supports various features to help you train large-scale models.
-
Deepspeed (from Microsoft) - Deepspeed-ZeRO - scales any model size with zero to no changes to the model. Integrated with HF Trainer.
-
fairscale (from Facebook) - Implements ZeRO protocol as well. Integrated with HF Trainer.
-
ColossalAI (from Hpcaitech) - A Unified Deep Learning System for Large-Scale Parallel Training (1D, 2D, 2.5D, 3D and sequence parallelism, and ZeRO protocol).
đī¸ Model Compression/Acceleration
Compressing or accelerate models for improved inference speed.
-
torchdistill - PyTorch-based modular, configuration-driven framework for knowledge distillation.
-
TextBrewer (from HFL) - State-of-the-art distillation methods to compress language models.
-
BERT-of-Theseus (from Microsoft) - Compressing BERT by progressively replacing the components of the original BERT.
đšī¸ Adversarial Attack
Conducting adversarial attack to test model robustness.
-
TextAttack (from UVa) - A Python framework for adversarial attacks, data augmentation, and model training in NLP.
-
TextFlint (from Fudan) - A unified multilingual robustness evaluation toolkit for NLP.
-
OpenAttack (from THU) - An open-source textual adversarial attack toolkit.
đ Style Transfer
Transfer the style of text! Now you know why it's called transformer?
-
Styleformer - A neural language style transfer framework to transfer text smoothly between styles.
-
ConSERT - A contrastive framework for self-supervised sentence representation transfer.
đĸ Sentiment Analysis
Analyzing the sentiment and emotions of human beings.
-
conv-emotion - Implementation of different architectures for emotion recognition in conversations.
đ
Grammatical Error Correction
You made a typo! Let me correct it.
-
Gramformer - A framework for detecting, highlighting and correcting grammatical errors on natural language text.
đē Translation
Translating between different languages.
-
dl-translate - A deep learning-based translation library based on HF Transformers.
-
EasyNMT (from UKPLab) - Easy-to-use, state-of-the-art translation library and Docker images based on HF Transformers.
đ Knowledge and Entity
Learning knowledge, mining entities, connecting the world.
-
PURE (from Princeton) - Entity and relation extraction from text.
đ Speech
Speech processing powered by HF libraries. Need for speech!
-
s3prl - A self-supervised speech pre-training and representation learning toolkit.
-
speechbrain - A PyTorch-based speech toolkit.
đ¤¯ Multi-modality
Understanding the world from different modalities.
-
ViLT (from Kakao) - A vision-and-language transformer Without convolution or region supervision.
đ¤ Reinforcement Learning
Combining RL magic with NLP!
-
trl - Fine-tune transformers using Proximal Policy Optimization (PPO) to align with human preferences.
â Question Answering
Searching for answers? Transformers to the rescue!
-
Haystack (from deepset) - End-to-end framework for developing and deploying question-answering systems in the wild.
đ Recommender Systems
I think this is just right for you!
-
Transformers4Rec (from Nvidia) - A flexible and efficient library powered by Transformers for sequential and session-based recommendations.
âī¸ Evaluation
Evaluating model outputs and data quality powered by HF datasets!
-
Jury (from OBSS) - Easy to use tool for evaluating NLP model outputs, spesifically for NLG (Natural Language Generation), offering various automated text-to-text metrics.
-
Spotlight - Interactively explore your HF dataset with one line of code. Use model results (e.g. embeddings, predictions) to understand critical data segments and model failure modes.
đ Neural Search
Search, but with the power of neural networks!
-
Jina Integration - Jina integration of Hugging Face Accelerated API.
- Weaviate Integration (text2vec) (QA) - Weaviate integration of Hugging Face Transformers.
-
ColBERT (from Stanford) - A fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
â Cloud
Cloud makes your life easy!
-
Amazon SageMaker - Making it easier than ever to train Hugging Face Transformer models in Amazon SageMaker.
đą Hardware
The infrastructure enabling the magic to happen.
-
Qualcomm - Collaboration on enabling Transformers in Snapdragon.
-
Intel - Collaboration with Intel for configuration options.