將 support set 和 query 的 embedding 做 concat,然後用 NN 計算相似程度。
同樣的 architecture 也可以用來做 ZSL,只要把 support set 換成 class semantic vector 即可
Few-shot Learning with Graph Neural Networks. ICLR 2018
metric-based [by CDFS&CloserLook]
GNN as metric function
Meta-learning with differentiable closed-form solvers. ICLR 2019
ridge regression
R2-D2
metric-based [by CloserLook]
提出 dataset: CIFAR-FS
Variational few-shot learning. ICCV 2019
metric-based [by DAPNA]
metric learning via variational inference
Dense classification and implanting for few-shot learning. CVPR 2019
metric-based [by CDFS]
Subspace Networks for Few-shot Classification. arXiv'1905.13613
follow "A Closer Look at Few-shot Classification" 的設定
根據 embedded query point 到每個 class subspace 的距離來 classify example
RepMet: Representative-based metric learning for classification and few-shot object detection. CVPR 2019
metric-based [by 2020survey]
Infinite Mixture Prototypes for Few-shot Learning. ICML 2019 Oral
Our infinite mixture prototypes represent each class by a set of clusters, unlike existing prototypical methods that represent each class by a single cluster.
semi-supervised and unsupervised setting
Optimization(Initialization)-based Methods
learning to fine-tune?
aim to learn a great initial parameter condition of base feature extractor, and then fine-tune the network to the unseen classes using just several examples within a few gradient steps. [by DAPNA]
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017
Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models.
When new data is encountered, the conventional models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference.
We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms.
Optimization as a model for few-shot learning. ICLR 2017
LSTM-based meta learning [by authors]
include the LSTM-based meta-learner for replacing the stochastic gradient decent optimizer. [by CloserLook]
A Simple Neural Attentive Meta-Learner. ICLR 2018
episodic training
RNN-based approach [by DAPNA(Few-Shot Learning as Domain Adaptation)]
approximates probability distributions by remembering the most surprising observations
algorithm can perform as well as state of the art baselines
main contributions:
surprise-based signal to write items to memory, not needing to learn what to write. So easier and faster to train, and minimizes how much data stored
(不懂???) An integrated external and working memory architecture which can take advantage of the best of both worlds: scalability and sparse access provided by the working memory; and all-to-all attention and reasoning provided by a relational reasoning module.
A training setup which steers the system towards learning an algorithm which approximates the posterior without backpropagating through the whole sequence of data in an episode.
Conclusion
We introduced a self-contained system which can learn to approximate a probability distribution with as little data and as quickly as it can. This is achieved by:
putting together the training setup which encourages adaptation
an external memory which allows the system to recall past events
a writing system to adapt the memory to uncertain situations
a working memory architecture which can efficiently compare items retrieved from memory to produce new predictions
We showed that the model can
Reach state of the art accuracy with a smaller memory footprint than other meta-learning models by efficiently choosing which data points to remember.
Scale to very large problem sizes thanks to the use of an external memory module with sparse access.
(不懂???) Perform fewer than 1-shot generalization thanks to relational reasoning across neighbors.
Hallucination(Data Augmentation) -based Approach
learning to augment
Low-shot visual recognition by shrinking and hallucinating features. ICCV 2017
Low-Shot Learning from Imaginary Data. CVPR 2018
directly integrate the generator into a meta-learning algorithm for improving the classification accuracy. [by CloserLook]
Delta-encoder: an effective sample synthesis method for few-shot object recognition. NIPS 2018
Data method: learned transformation
Our approach is based on a modified auto-encoder, denoted delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier.
proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class.
the delta-encoder
The simple key idea of this work is to change the meaning of $E(X)$ from representing the "essence" of $X$, to representing the delta, or "additional information" needed to reconstruct $X$ from $Y$ (an observed example from the same category).
$E$ for encoder, $D$ for decoder
LaSO: Label-Set Operations networks for multi-label few-shot learning. CVPR 2019
Few-Shot Learning via Saliency-guided Hallucination of Samples. CVPR 2019
Spot and Learn: A Maximum-Entropy Image Patch Sampler for Few-Shot Classification. CVPR 2019
暫時無 code (2020/4/15)
Image Deformation Meta-Networks for One-Shot Learning. CVPR 2019
Cross Attention Network for Few-shot Classification. NeurIPS 2019
暫時無 code (2020/4/15)
Learn a attention(mask) to pay more attention on the part of the images
Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data. ECCV 2018
done by sharing the first several layers of two networks to learn the generic information, while learning a different last layer to deal with different output for each task.
metric-based few-shot classification often fail to generalize to unseen domains due to large discrepancy of the feature distribution across domains.
core idea is to use feature-wise transformation layers for augmenting the image features using affine transforms to simulate various feature distributions under different domains in the training stage.
further apply a learning-to-learn approach to search for the hyper-parameters of the feature-wise transformation layers.
optimize the feature-wise transformation layers so that the model can work well on the unseen domains after training the model using the seen domains.
semi-hard mining (wats difference with facenet???)
Label Efficient Learning of Transferable Representations across Domains and Tasks. NIPS 2017 (Li Fei-Fei)
Our model is simultaneously optimized on labeled source data and unlabeled or sparsely labeled data in the target domain.
Our method shows compelling results on novel classes within a new domain even when only a few labeled examples per class are available, outperforming the prevalent fine-tuning approach.
initialize the CNN for the target tasks in the target domain by a pre-trained CNN learning from source tasks in source domain. During training, they use an adversarial loss calculated from representations in multiple layers of CNN to force the two CNNs projects samples to a task-invariant space.
One Shot Domain Adaptation for Person Re-Identification. 2018
Meta-Learning with Domain Adaptation for Few-Shot Learning under Domain Shift, ICLR 2019 rejected
The proposed approach consists of combining a known few shot learning model, prototypical nets, together with image to image translation via CycleGAN for domain adaptation. Thus the algorithmic novelty is minor and amounts to combining two techniques to address a different problem statement.
though meta learning could be a solution to learn with few examples, the solution being used in this work is not meta learning and so should not be in the title to avoid confusion.
Learning Embedding Adaptation for Few-Shot Learning. arXiv'1812
given only one example of each new class. Can we transfer knowledge learned by oneshot learning from one domain to another?
propose a domain adaption framework based on adversarial networks.
This framework is generalized for situations where the source and target domain have different labels.
use a policy network, inspired by human learning behaviors, to effectively select samples from the source domain in the training process. This sampling strategy can further improve the domain adaption performance.
A Closer Look at Few-shot Classification. ICLR 2019
提出兩個普通 baseline,發現許多情況可以和 SOTA 的 fewshot learning 媲美
比較的 SOTA 方法:MatchingNet、ProtoNet、RelationNet、MAML
domain 差異小的情況下(例如CUBS),隨著 baseNN 越強,不同 SOTA 方法的差異越小
domain 差異大的情況下(例如miniImageNet),隨著 baseNN 越強,不同 SOTA 方法的差異越大
有領域飄移情況發生時,SOTA 方法甚至沒有 baseline 表現好
特別強調 SOTA 在 domain adaptation 做得不好
Reviewers' Comment
The conclusion from the network depth experiments is that “gaps among different methods diminish as the backbone gets deeper”. However, in a 5-shot mini-ImageNet case, this is not what the plot shows. Quite the opposite: the gap increased. Did I misunderstand something? Could you please comment on that?
跟我想問的問題一樣
Authors' Answer: Sorry for the confusion. As addressed in 4.3, gaps among different methods diminish as the backbone gets deeper in the CUB dataset. In the mini-ImageNet dataset, the results are more complicated due to the domain difference. We further discuss this phenomenon in Section 4.4 and 4.5. We have clarified related texts in the revised paper.
Few-shot Learning with Meta Metric Learners. NIPS 2017 workshop on Meta-Learning, arXiv'1901.09890
Microsoft AI & Research, IBM Research AI, JD AI Research
Existing meta-learning or metric-learning based few-shot learning approaches are limited in handling diverse domains with various number of labels.
we proposed a meta metric learner for few-shot learning, which is a combination of an LSTM meta-learner and a base metric classifier.
The proposed method takes several advantages such as is able to handle unbalanced classes as well as to generate task-specific metrics.
We test our approach in the ‘k-shot N-way’ few-shot learning setting used in previous work and new realistic few-shot setting with diverse multi-domain tasks and flexible label numbers.
contributions
improve the existing few-shot learning work to handle various class labels (not only k-shot N-way)
enable the model to learn task specific metrics via training a meta learner
we are the first to investigate few-shot deep learning methods in the text domains.
Understanding few-shot learning
A Meta Understanding of Meta-Learning. ICML 2019 Workshop (under review)
別名 "Revisiting Meta-Learning as Supervised Learning"
以 supervised learning 的方式去理解 meta-learning
Human-level concept learning through probabilistic program induction.
(No Deep Learning, but worth reading)
Negative Margin Matters: Understanding Margin in Few-shot Classification. arXiv'2003
Semantic Regularization: Improve Few-shot Image Classification by Reducing Meta Shift. arXiv'1912
A Theoretical Analysis of the Number of Shots in Few-Shot Learning. ICLR 2020
Rethinking Meta-learning
A Baseline for Few-Shot Image Classification. ICLR 2020
Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? arXiv'2003
複雜的 meta-learning 結構其實沒這麼屌
A Baseline for Few-Shot Image Classification. ICLR 2020
A New Meta-Baseline for Few-Shot Learning. arXiv'2003
All you need is a good representation: A multi-level and classifier-centric representation for few-shot learning. arXiv'1911
Semi-supervised
Low-shot learning with large-scale diffusion. CVPR 2018
Data method: transform other dataset???
semi-supervised setting to support label propagation
In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders.
this paper is not really doing few-shot learning, because according to section 3.2. and the experiments, the authors use the test labels in order to know which word embeddings to assign to each sample: "[...] containing label embeddings of all categories in D_train ∪ D_test". In other words, the authors use the labels (which are the goal of the classification task) to find the match between the two input modalities (to know what Glove vector to assign to each image).
the experiments compare the results only between this multimodal approach and visual approaches. I believe using the Glove embeddings alone (no visual input) could give very good results on their own, and it is thus crucial for the authors to compare with this scenario too.
the explanation for why you chose this form for lambda_c is unclear: "A very structured semantic space is a good choice for conditioning."
Semantic Feature Augmentation in Few-shot Learning. ECCV 2018
Few-Shot Learning with Global Class Representations. ICCV 2019
utilize semantic information [by DAPNA]
others
TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning. 2019
metric-based [by 2020survey]
Incremental Few-Shot Learning with Attention Attractor Networks. NIPS 2019
problem
This paper addresses this problem, incremental few-shot learning, where a regular classification network has already been trained to recognize a set of base classes, and several extra novel classes are being considered, each with only a few labeled examples. After learning the novel classes, the model is then evaluated on the overall classification performance on both base and novel classes
Multi-attention Network for One Shot Learning. CVPR 2017
We propose a novel visual attribute encoding method that encodes each image as a low-dimensional probability vector composed of prototypical part-type probabilities.
At test-time we freeze the encoder and only learn/adapt the classifier component to limited annotated labels in FSL; new semantic attributes in ZSL.
Meta-Learning Probabilistic Inference for Prediction. ICLR 2019
Fast Context Adaptation via Meta-Learning. ICML 2019
We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable.
CAVIA partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks. At test time, only the context parameters are updated, leading to a low-dimensional task representation.
They learn separate embedding for source and target tasks in different domains to map them into a task-invariant space, then learn a shared classifier to classify samples from all tasks.
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. ICML 2019
Sever: A Robust Meta-Algorithm for Stochastic Optimization. ICML 2019
Probable Guarantees for Gradient-Based Meta-Learning. ICML 2019
Meta-Learning Neural Bloom Filters. ICML 2019
We propose a novel memory architecture, the Neural Bloom Filter, which is able to achieve significant compression gains over classical Bloom Filters and existing memory-augmented neural networks