FARM Versions Save

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

v0.8.0

2 years ago

DPR Improvements

DPR - improve loading of datasets #733 @voidful DPR - enable saving and loading of other model types, e.g., RoBERTa models #765 @Timoeller @julian-risch DPR - fix conversion of BiAdaptiveModel #753 @bogdankostic

torch 1.8.1 and transformers 4.6.1

Bump transformers version to 4.6.1 #787 @Timoeller @julian-risch Bump torch version to 1.8.1 #767 @Timoeller @julian-risch

Multi-task Learning

Implement Multi-task Learning and added example #778 @johann-petrak

List of Evaluation Metrics

Allow list of metrics and add tests and pythondoc #777 @johann-petrak

Misc

Reduce number of logging messages by Processor about returning problematic ids #772 @johann-petrak Add farm.__version__ tag #761 @johann-petrak Add value of doc_stride, max_seq_len, max_query_length in error message #784 @ftesser Convert QACandidates with empty or whitespace answers to no_answers on doc level #756 @julian-risch

String comparison: Should replace "is" with "==": #774 @johann-petrak Fix reference before assignment in DataSilo #738 @bogdankostic Changing QA_input format in tutorial #735 @julian-risch Fix TextPairClassificationProcessor example by adding metric #780 @julian-risch

v0.7.1

3 years ago

A patch release focusing on bug fixes for Dense Passage Retrieval DPR Fix saving and loading of DPR models and Processors in #746 Fix DPR tokenization statisticss in #738 Fix cosine similarity in DPR training #741

Misc Fix tuple input for TextPairClassification inference #723

v0.7.0

3 years ago

QA Confidence Scores

In response to several requests from the community, we now provide more meaningful confidence scores for the predictions of extractive QA models. #690 #705 @julian-risch @timoeller @lalitpagaria To this end, predicted answers got a new attribute called confidence, which is in the range [0,1] and can be calibrated with the probability that a prediction is an exact match. The intuition behind the scores is the following: After calibration, if the average confidence of 100 predictions is 70%, then on average 70 of the predictions will be correct. The implementation of the calibration uses a technique called temperature scaling. The calibration can be executed on a dev set by running the eval() method in the Evaluator class and setting the parameter calibrate_conf_scores to true. This parameter is false by default as it is still an experimental feature and we continue working on it. The score attribute of predicted answers and their ranking remain unchanged so that the default behavior is unchanged. An example shows how to calibrate and use the confidence scores.

Misc

Refactor Text pair handling, that also add Text pair regression #713 @timoeller Refactor Textsimilarity processor #711 @timoeller Refactor Regression and inference processors #702 @timoeller Fix NER probabilities #700 @brandenchan Calculate squad evaluation metrics overall and separately for text answers and no answers #698 @julian-risch Re-enable test_dpr_modules also for windows #697 @ftesser Use Path instead of String in ONNXAdaptiveModel #694 @skiran252 Big thanks to all contributors!

v0.6.2

3 years ago

This is just a small patch to change the return types of offsets in our QAInferencer, see #693

It is needed to fix RestAPI related issues where int64 cannot decoded within JSONs.

v0.6.1

3 years ago

Patch release

This is just a quick patch release to bugfix some input validation for Question Answering [closed] Fix/missing truncation bug #679

Additional feature for QA

Still, another interesting feature slipped in: We can now filter QA predictions to not contain duplicate answers. [closed] Added filter_range parameter that allows to filter answers with similar start/end indices #680

Additional test

[part: tokenizer][task: QA] Add integration test for QA processing #683

Misc

[closed] Remove "qas" inference input wherever possible #681 [closed] Added parameter names to convert_from_transformers call in question_answering_crossvalidation.py #672

v0.6.0

3 years ago

Simplification of Preprocessing

We wanted to make preprocessing for all our tasks (e.g. QA, DPR, NER, classification) more understandable for FARM users, so that it is easier to adjust to specific use cases or extend the functionality to new tasks.

To achieve this we followed two design choices:

Avoid deeply nested calls
Keep all high-level descriptions in a single place

Question Answering Preprocessing

We especially focussed on making QA processing more sequential and divided the code into meaningful snippets #649

The code snippets are (see related method):

convert the input into FARM specific QA format
tokenize the questions and texts
split texts into passages to fit the sequence length constraint of Language Models
[optionally] convert labels (disabled during inference)
convert question, text, labels and additional information to PyTorch tensors

Breaking changes

Switching to FastTokenizers (based on Huggingface tokenizer project written in Rust) as default Tokenizer. We changed the use_fast=True parameter in the Tokenizer.load() method. Support for slow, python-based Tokenizers will be implemented for all tasks in the next release.
The Processor.dataset_from_dicts method by default returns an additional parameter problematic_sample_ids that keeps track of which input sample caused problems during preprocessing:

dataset, tensor_names, problematic_sample_ids = processor.dataset_from_dicts(dicts=dicts)

Update to transformers version 4.1.1 and torch version 1.7.0

Transformers comes with many new features, including model versioning, that we do not want to miss out on. #665 Model versions can now be specified like:

    model = Inferencer.load(
        model_name_or_path="deepset/roberta-base-squad2",
        revision="v2.0",
        task_type="question_answering",
    )

DPR enhancements

MultiGPU support #619
Added tests #643
Bugfixes and smaller enhancements #629 #655 #663

Misc

Cleaner logging and error handling #639
Benchmark automation via CML #646
Disable DPR tests on Windows, since they do not work with PyTorch 1.6.1 #637
Option to disable MLflow logger #650
Fix to Earlystopping and custom head #617
Adding probability of masking a token parameter for LM task #630

Big thanks to all contributors! @ftesser @pashok3d @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor

v0.5.0

3 years ago

Add Dense Passage Retriever (DPR) incl. Training & Inference (#513, #601, #606)

Happy to introduce a completely new task type to FARM: Text similarity with two separate transformer encoders

Why? We observe a big shift in Information Retrieval from sparse methods (BM25 etc.) towards dense methods that encode queries and docs as vectors and use vector similarity to retrieve the most similar docs for a certain query. This is not only helpful for document search but also for open-domain Question Answering. Dense methods outperform sparse methods already in many domains and are especially powerful if the matching between query and passage cannot happen via "keywords" but rather relies on semantics / synonyms / context.

What? One of the most promising methods at the moment is "Dense Passage Retrieval" from Karphukin et al. (https://arxiv.org/abs/2004.04906). In a nutshell, DPR uses one transformer to encode the query and a second transformer to encode the passage. The two encoders project the different texts into the same vector space and are trained jointly on a similarity measure using in-batch-negatives.

How? We introduce a new class BiAdaptiveModel that has two language models plus a prediction head. In the case of DPR, this will be one question encoder model and one passage encoder model.
See the new example script dpr_encoder.py for training / fine-tuning a DPR model. We also have a tight integration in Haystack, where you can use it as a Retriever for open-domain Question Answering.

Refactor conversion from / to Transformers #576

We simplified conversion between FARM <-> Transformers. You can now run:

# Transformers -> FARM
model = Converter.convert_from_transformers("deepset/roberta-base-squad2", device="cpu")

# FARM -> Transformers
transformer_models = Converter.convert_to_transformers(your_adaptive_model)

Note: In case your FARM AdaptiveModel has multiple prediction heads (e.g. 1x NER, 1x Text Classification), the conversion will return a list with two transformer models (both with one head respectively).

Upgrade to Transformers 3.3.1 #579

Transformers 3.3.1 comes with a few new interesting features, incl. support for Retrieval-Augmented Generation (RAG) which can be used to generate answers rather than extracting answers. In contrast to GPT-3, the generation is conditioned on a set of retrieved documents, and is, therefore, more suitable for most QA applications in the industry that rely on a domain corpus.
Thanks to @lalitpagaria, we'll support RAG also in Haystack soon (see https://github.com/deepset-ai/haystack/pull/484)

Details

Question Answering

Improve Speed: Vectorize Question Answering Prediction Head #603
Fix removal of yes no answers #540
Fix QA bug that rejected spans at beginning of passage #564
Added warning about that Natural Questions Inference. #565
Remove loss index from QA PH #589

Other

Catch empty datasets in Inferencer #605
Add option to set evaluation batch size #607
Infer model type from config #600
Fix random behavior when loading ELECTRA models #599
Fix import for Python3.6 #581
Fixed conversion of BertForMaskedLM to transformers #555
Load correct config for DistilBert model #562
Add passages per second calculation to benchmarks #560
Fix batching in ONNX forward pass #559
Add ONNX conversion & Inference #557

Big thanks to all contributors! @ftesser @lalitpagaria @himanshurawlani @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor

v0.4.9

3 years ago

Minor patch: Relax PyTorch version requirements

Installing FARM in environments where torch's GPU version was already installed via pip (e.g. torch 1.6.0+cu101), caused version trouble. This is especially annoying in Google Colab environments. Change: Allow all torch 1.6.x versions incl 1.6.0+cu101 etc

Further changes:

Nested cross validation by @PhilipMay #508

0.4.8

3 years ago

Minor release

Experimental Support for fast Rust Tokenizers (#482)

While preprocessing is usually not the bottleneck in our pipelines, there's still significant time spent on it (~ 20 % for QA inference). We saw substantial speed-ups with HuggingFace's "FastTokenizers" that are based on rust. We are therefore introducing a basic "experimental" implementation with this release. We are planning to stabilizing it and having a smoother fit into the FARM processor.

Usage:

tokenizer = Tokenizer.load(pretrained_model_name_or_path=""bert-base-german-cased"",
                           do_lower_case=False, 
                           use_fast=True)

Upgrade to transformers 3.1.0 (#464)

The latest transformers release has quite interesting new features - one of them being basic support of a DPR model class (Dense Passage Retriever). This will simplify our dense passage retriever integration in Haystack and the upcoming DPR training which we plan to have in FARM.

Details

Question Answering

Add asserts on doc_stride and max_seq_len to prevent issues with sliding window #538
fix Natural Question inference processing #521

Other

Fix logging of error msg for FastTokenizer + QA #541
Fix truncation warnings in tokenizer #528
Evaluate model on best model when doing early stopping #524
Bump transformers version to 3.1.0 #515
Add warmup run to component benchmark #504
Add optional s3 auth via params #511
Add option to use fast HF tokenizer. #482
CodeBERT support for embeddings #488
Store test eval result in variable #506
Fix typo f1 micro vs. macro #505

Big thanks to all contributors! @PhilipMay @lambdaofgod @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor

0.4.7

3 years ago

Main changes

Support for MiniLM Model (#464)

Interesting model from Microsoft that is up to 2.7x faster than BERT, while showing similar or better performance on many tasks (Paper). We found it particularly interesting for QA and also published a fine-tuned model on SQuAD 2.0: deepset/minilm-uncased-squad2

Benchmarks per component (#491)

Measuring the speed of individual components in the pipeline while respecting CUDA's async behaviour. We were especially interested in analyzing how much time we spend for QA in preprocessing, language model, and prediction head. Turns out it's on average about 20% : 50% : 30%. Interestingly, there's a high variance in the prediction head depending on the relevance of the question. We will use that information to further optimize performance in the prediction head. We'll share more detailed benchmarks soon.

Support for PyTorch 1.6 (#502)

We now support 1.6 and 1.5.1

Details

Question Answering

Pass max_answers param to processor #503
Deprecate QA input dicts with [context, qas] as keys #472
Squad processor verbose feature #470
Propagate QA ground truth in Inferencer #469
Ensure QAInferencer always has task_type "question_answering" #460

Other

Download models from (private) S3 #500
fix _initialize_data_loaders in data_silo #476
Remove torch version wildcard in requirements #489
Make num processes parameter consistent across inferencer and data silo #480
Remove rest_api_schema argument in inference_from_dicts() #474
farm.data_handler.utils: Add encoding to open write in split_file method #466
Fix and document Inferencer usage and pool handling #429
Remove assertions or replace with logging error #468
Remove baskets without features in _create_dataset. #471
fix bugs with regression label standardization #456

Big thanks to all contributors! @PhilipMay @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @rohanag @lingsond @ftesser