๐ A curated list of awesome practical Metric Learning and its applications
๐ Awesome list about practical Metric Learning and its applications
At Qdrant, we have one goal: make metric learning more practical. This listing is in line with this purpose, and we aim at providing a concise yet useful list of awesomeness around metric learning. It is intended to be inspirational for productivity rather than serve as a full bibliography.
If you find it useful or like it in some other way, you may want to join our Discord server, where we are running a paper reading club on metric learning.
If you want to contribute to this project, but don't know how, you may want to check out the contributing guide. It's easy! ๐
It has proceeding guides for supervised, weakly supervised and unsupervised metric learning algorithms in
metric_learn
package.
Factors such as sampling strategies, distance metrics, and network structures are systematically analyzed by comparing the quantitative results of the methods.
It discusses the need for metric learning, old and state-of-the-art approaches, and some real-world use cases.
NLP
CV
CLIP offers state-of-the-art zero-shot image classification and image retrieval with a natural language query. See demo.
Audio
This work achieves zero-shot classification and cross-modal audio retrieval from natural language queries.
CV
It is an open-class object detector to detect any label encoded by CLIP without finetuning. See demo.
NLP
TensorFlow Hub offers a collection of pretrained models from the paper Large Dual Encoders Are Generalizable Retrievers. GTR models are first initialized from a pre-trained T5 checkpoint. They are then further pre-trained with a set of community question-answer pairs. Finally, they are fine-tuned on the MS Marco dataset. The two encoders are shared so the GTR model functions as a single text encoder. The input is variable-length English text and the output is a 768-dimensional vector.
NLP
The method and pretrained models found in Flair go beyond zero-shot sequence classification and offers zero-shot span tagging abilities for tasks such as named entity recognition and part of speech tagging.
NLP
It leverages HuggingFace Transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics while keeping important words in the topic descriptions. It supports guided, (semi-) supervised, and dynamic topic modeling beautiful visualizations.
Identification of substances based on spectral analysis plays a vital role in forensic science. Similarly, the material identification process is of paramount importance for malfunction reasoning in manufacturing sectors and materials research. This models enables to identify materials with deep metric learning applied to X-Ray Diffraction (XRD) spectrum. Read this post for more background.
NLP
Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics. The repository provides the pretrained models and source code for Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus, where they apply several tricks to achieve this.
CV
NLP
RecSys
State-of-the-art methods are incapable of leveraging attributes from different types of items and thus suffer from data sparsity problems because it is quite challenging to represent items with different feature spaces jointly. To tackle this problem, they propose a kernel-based neural network, namely deep unified representation (DURation) for heterogeneous recommendation, to jointly model unified representations of heterogeneous items while preserving their original feature space topology structures. See paper.
RecSys
It provides the implementation of Item2Vec: Neural Item Embedding for Collaborative Filtering, wrapped as a
sklearn
estimator compatible withGridSearchCV
andBayesSearchCV
for hyperparameter tuning.
You can search for the overall closest fit, or choose to focus matching genre, mood, or instrumentation.
NLP
It searches phrase-level answers to your questions in real-time or retrieves passages for downstream tasks. Check out demo, or see paper.
NLP
Instead of leveraging NLI/XNLI, they make use of the text encoder of the CLIP model, concluding from casual experiments that this sometimes gives better accuracy than NLI-based models.
Application of the SimCLR method to musical data with out-of-domain generalization in million-scale music classification. See demo or paper.
Quaterion is a framework for fine-tuning similarity learning models. The framework closes the "last mile" problem in training models for semantic search, recommendations, anomaly detection, extreme classification, matching engines, e.t.c. It is designed to combine the performance of pre-trained models with specialization for the custom task while avoiding slow and costly training.
NLP
Developed on top of the well-known Transformers library, it provides an easy way to finetune Transformer-based models to obtain sequence-level embeddings.
CV
NLP
The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification.
It provides support for self-supervised contrastive learning and state-of-the-art methods such as SimCLR, SimSian, and Barlow Twins.
NLP
A PyTorch library to train and inference with contextually-keyed word vectors augmented with part-of-speech tags to achieve multi-word queries.
CV
A PyTorch library to efficiently train self-supervised computer vision models with state-of-the-art techniques such as SimCLR, SimSian, Barlow Twins, BYOL, among others.
NLP
A library that helps you benchmark pretrained and custom embedding models on tens of datasets and tasks with ease.
RecSys
It supports incorporating user and item features to the traditional matrix factorization. It represents users and items as a sum of the latent representations of their features, thus achieving a better generalization.
It provides efficient multicore and memory-independent implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec.
RecSys
It provides implementations of algorithms such as KNN, LFM, SLIM, NeuMF, FM, DeepFM, VAE and so on, in order to ensure fair comparison of recommender system benchmarks.
It supports UMAP, T-SNE, PCA, or custom techniques to analyze embeddings of encoders.
It allows you to visualize the embedding space selecting explicitly the axis through algebraic formulas on the embeddings (like king-man+woman) and highlight specific items in the embedding space. It also supports implicit axes via PCA and t-SNE. See paper.
NLP
It provides benchmarking of 20+ ANN algorithms on nine standard datasets with support to bring your dataset. (Medium Post)
It is not the fastest ANN algorithm but achieves memory efficiency thanks to various quantization and indexing methods such as IVF, PQ, and IVF-PQ. (Tutorial)
It is still one of the fastest ANN algorithms out there, requiring relatively a higher memory usage. (Paper: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs)
Paper: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Published by Yann Le Cun et al. (2005), its main focus was on dimensionality reduction. However, the method proposed has excellent properties for metric learning such as preserving neighbourhood relationships and generalization to unseen data, and it has extensive applications with a great number of variations ever since. It is advised that you read this great post to better understand its importance for metric learning.
The paper introduces Triplet Loss, which can be seen as the "ImageNet moment" for deep metric learning. It is still one of the state-of-the-art methods and has a great number of applications in almost any data modality.
It provides scale invariance, robustness against feature variance, and better convergence than Contrastive and Triplet Loss.
Although it is originally designed for the face recognition task, this loss function achieves state-of-the-art results in many other metric learning problems with a simpler and faster data feeding. It is also robust against unclean and unbalanced data when modified with sub-centers and a dynamic margin.
The paper introduces a method that explicitly avoids the collapse problem in high dimensions with a simple regularization term on the variance of the embeddings along each dimension individually. This new term can be incorporated into other methods to stabilize the training and performance improvements.
The paper proposes using the mean centroid representation during training and retrieval for robustness against outliers and more stable features. It further reduces retrieval time and storage requirements, making it suitable for production deployments.
CV
It demonstrates among other things that
- composition of data augmentations plays a critical role - Random Crop + Random Color distortion provides the best downstream classifier accuracy,
- introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations,
- and Contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
NLP
They also incorporates annotated pairs from natural language inference datasets into their contrastive learning framework in a supervised setting, showing that contrastive learning objective regularizes pre-trained embeddingsโ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.
NLP
CV
NLP
CV
Mining informative negative instances are of central importance to deep metric learning (DML), however this task is intrinsically limited by mini-batch training, where only a mini-batch of instances is accessible at each iteration. In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters are updating throughout the training process. This suggests that the features of instances computed at preceding iterations can be used to considerably approximate their features extracted by the current model.
Practitioners can use any labeled or unlabelled data for metric learning with an appropriate method chosen. However, some datasets are particularly important in the literature for benchmarking or other ways, and we list them in this section.
NLP
The dataset contains pairs of sentences labeled as
contradiction
,entailment
, andneutral
regarding semantic relationships. Useful to train semantic search models in metric learning.
NLP
Modeled on the SNLI corpus, the dataset contains sentence pairs from various genres of spoken and written text, and it also offers a distinctive cross-genre generalization evaluation.
CV
Shared as a part of a Kaggle competition by Google, this dataset is more diverse and thus more interesting than the first version.
CV
The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
CV
The dataset is published along with "Deep Metric Learning via Lifted Structured Feature Embedding" paper.
CV
The dataset is published along with "The 2021 Image Similarity Dataset and Challenge" paper.