:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
DPR - improve loading of datasets #733 @voidful DPR - enable saving and loading of other model types, e.g., RoBERTa models #765 @Timoeller @julian-risch DPR - fix conversion of BiAdaptiveModel #753 @bogdankostic
Bump transformers version to 4.6.1 #787 @Timoeller @julian-risch Bump torch version to 1.8.1 #767 @Timoeller @julian-risch
Implement Multi-task Learning and added example #778 @johann-petrak
Allow list of metrics and add tests and pythondoc #777 @johann-petrak
Reduce number of logging messages by Processor about returning problematic ids #772 @johann-petrak Add farm.__version__ tag #761 @johann-petrak Add value of doc_stride, max_seq_len, max_query_length in error message #784 @ftesser Convert QACandidates with empty or whitespace answers to no_answers on doc level #756 @julian-risch
String comparison: Should replace "is" with "==": #774 @johann-petrak Fix reference before assignment in DataSilo #738 @bogdankostic Changing QA_input format in tutorial #735 @julian-risch Fix TextPairClassificationProcessor example by adding metric #780 @julian-risch
A patch release focusing on bug fixes for Dense Passage Retrieval DPR Fix saving and loading of DPR models and Processors in #746 Fix DPR tokenization statisticss in #738 Fix cosine similarity in DPR training #741
Misc Fix tuple input for TextPairClassification inference #723
In response to several requests from the community, we now provide more meaningful confidence scores for the predictions of extractive QA models. #690 #705 @julian-risch @timoeller @lalitpagaria To this end, predicted answers got a new attribute called confidence, which is in the range [0,1] and can be calibrated with the probability that a prediction is an exact match. The intuition behind the scores is the following: After calibration, if the average confidence of 100 predictions is 70%, then on average 70 of the predictions will be correct. The implementation of the calibration uses a technique called temperature scaling. The calibration can be executed on a dev set by running the eval() method in the Evaluator class and setting the parameter calibrate_conf_scores to true. This parameter is false by default as it is still an experimental feature and we continue working on it. The score attribute of predicted answers and their ranking remain unchanged so that the default behavior is unchanged. An example shows how to calibrate and use the confidence scores.
Refactor Text pair handling, that also add Text pair regression #713 @timoeller Refactor Textsimilarity processor #711 @timoeller Refactor Regression and inference processors #702 @timoeller Fix NER probabilities #700 @brandenchan Calculate squad evaluation metrics overall and separately for text answers and no answers #698 @julian-risch Re-enable test_dpr_modules also for windows #697 @ftesser Use Path instead of String in ONNXAdaptiveModel #694 @skiran252 Big thanks to all contributors!
This is just a small patch to change the return types of offsets in our QAInferencer, see #693
It is needed to fix RestAPI related issues where int64 cannot decoded within JSONs.
This is just a quick patch release to bugfix some input validation for Question Answering [closed] Fix/missing truncation bug #679
Still, another interesting feature slipped in: We can now filter QA predictions to not contain duplicate answers. [closed] Added filter_range parameter that allows to filter answers with similar start/end indices #680
[part: tokenizer][task: QA] Add integration test for QA processing #683
[closed] Remove "qas" inference input wherever possible #681 [closed] Added parameter names to convert_from_transformers call in question_answering_crossvalidation.py #672
We wanted to make preprocessing for all our tasks (e.g. QA, DPR, NER, classification) more understandable for FARM users, so that it is easier to adjust to specific use cases or extend the functionality to new tasks.
To achieve this we followed two design choices:
We especially focussed on making QA processing more sequential and divided the code into meaningful snippets #649
The code snippets are (see related method):
use_fast=True
parameter in the Tokenizer.load() method. Support for slow, python-based Tokenizers will be implemented for all tasks in the next release.problematic_sample_ids
that keeps track of which input sample caused problems during preprocessing:dataset, tensor_names, problematic_sample_ids = processor.dataset_from_dicts(dicts=dicts)
Transformers comes with many new features, including model versioning, that we do not want to miss out on. #665 Model versions can now be specified like:
model = Inferencer.load(
model_name_or_path="deepset/roberta-base-squad2",
revision="v2.0",
task_type="question_answering",
)
Big thanks to all contributors! @ftesser @pashok3d @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor
Happy to introduce a completely new task type to FARM: Text similarity with two separate transformer encoders
Why? We observe a big shift in Information Retrieval from sparse methods (BM25 etc.) towards dense methods that encode queries and docs as vectors and use vector similarity to retrieve the most similar docs for a certain query. This is not only helpful for document search but also for open-domain Question Answering. Dense methods outperform sparse methods already in many domains and are especially powerful if the matching between query and passage cannot happen via "keywords" but rather relies on semantics / synonyms / context.
What? One of the most promising methods at the moment is "Dense Passage Retrieval" from Karphukin et al. (https://arxiv.org/abs/2004.04906). In a nutshell, DPR uses one transformer to encode the query and a second transformer to encode the passage. The two encoders project the different texts into the same vector space and are trained jointly on a similarity measure using in-batch-negatives.
How?
We introduce a new class BiAdaptiveModel
that has two language models plus a prediction head.
In the case of DPR, this will be one question encoder model and one passage encoder model.
See the new example script dpr_encoder.py for training / fine-tuning a DPR model.
We also have a tight integration in Haystack, where you can use it as a Retriever for open-domain Question Answering.
We simplified conversion between FARM <-> Transformers. You can now run:
# Transformers -> FARM
model = Converter.convert_from_transformers("deepset/roberta-base-squad2", device="cpu")
# FARM -> Transformers
transformer_models = Converter.convert_to_transformers(your_adaptive_model)
Note: In case your FARM AdaptiveModel has multiple prediction heads (e.g. 1x NER, 1x Text Classification), the conversion will return a list with two transformer models (both with one head respectively).
Transformers 3.3.1 comes with a few new interesting features, incl. support for Retrieval-Augmented Generation (RAG) which can be used to generate answers rather than extracting answers. In contrast to GPT-3, the generation is conditioned on a set of retrieved documents, and is, therefore, more suitable for most QA applications in the industry that rely on a domain corpus.
Thanks to @lalitpagaria, we'll support RAG also in Haystack soon (see https://github.com/deepset-ai/haystack/pull/484)
Big thanks to all contributors! @ftesser @lalitpagaria @himanshurawlani @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor
Installing FARM in environments where torch's GPU version was already installed via pip (e.g. torch 1.6.0+cu101), caused version trouble. This is especially annoying in Google Colab environments. Change: Allow all torch 1.6.x versions incl 1.6.0+cu101 etc
Further changes:
While preprocessing is usually not the bottleneck in our pipelines, there's still significant time spent on it (~ 20 % for QA inference). We saw substantial speed-ups with HuggingFace's "FastTokenizers" that are based on rust. We are therefore introducing a basic "experimental" implementation with this release. We are planning to stabilizing it and having a smoother fit into the FARM processor.
Usage:
tokenizer = Tokenizer.load(pretrained_model_name_or_path=""bert-base-german-cased"",
do_lower_case=False,
use_fast=True)
The latest transformers release has quite interesting new features - one of them being basic support of a DPR model class (Dense Passage Retriever). This will simplify our dense passage retriever integration in Haystack and the upcoming DPR training which we plan to have in FARM.
Big thanks to all contributors! @PhilipMay @lambdaofgod @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor
Interesting model from Microsoft that is up to 2.7x faster than BERT, while showing similar or better performance on many tasks (Paper). We found it particularly interesting for QA and also published a fine-tuned model on SQuAD 2.0: deepset/minilm-uncased-squad2
Measuring the speed of individual components in the pipeline while respecting CUDA's async behaviour. We were especially interested in analyzing how much time we spend for QA in preprocessing, language model, and prediction head. Turns out it's on average about 20% : 50% : 30%. Interestingly, there's a high variance in the prediction head depending on the relevance of the question. We will use that information to further optimize performance in the prediction head. We'll share more detailed benchmarks soon.
We now support 1.6 and 1.5.1
Big thanks to all contributors! @PhilipMay @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @rohanag @lingsond @ftesser