Fairseq Versions Save

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

v0.12.2

1 year ago

v0.12.1

1 year ago

v0.12.0

1 year ago

v0.10.2

3 years ago

Bug fixes:

  • fix register_model_architecture for Transformer language model (#3097)
  • fix logging to use stdout instead of stderr (#3052)

v0.10.1

3 years ago

This minor release includes fixes for torch.distributed.launch, --user-dir and a few smaller bugs. We also include prebuilt wheels for common platforms.

v0.10.0

3 years ago

It's been a long time since our last release (0.9.0) nearly a year ago! There have been numerous changes and new features added since then, which we've tried to summarize below. While this release carries the same major version as our previous release (0.x.x), if you have code that relies on 0.9.0, it is likely you'll need to adapt it before updating to 0.10.0.

Looking forward, this will also be the last significant release with the 0.x.x numbering. The next release will be 1.0.0 and will include a major migration to the Hydra configuration system, with an eye towards modularizing fairseq to be more usable as a library.

Changelog:

New papers:

Major new features:

  • TorchScript support for Transformer and SequenceGenerator (PyTorch 1.6+ only)
  • Model parallel training support (see Megatron-11b)
  • TPU support via --tpu and --bf16 options (775122950d145382146e9120308432a9faf9a9b8)
  • Added VizSeq (a visual analysis toolkit for evaluating fairseq models)
  • Migrated to Python logging (fb76dac1c4e314db75f9d7a03cb4871c532000cb)
  • Added “SlowMo” distributed training backend (0dac0ff3b1d18db4b6bb01eb0ea2822118c9dd13)
  • Added Optimizer State Sharding (ZeRO) (5d7ed6ab4f92d20ad10f8f792b8703e260a938ac)
  • Added several features to improve speech recognition support in fairseq: CTC criterion, external ASR decoder support (currently only wav2letter decoder) with KenLM and fairseq language model fusion

Minor features:

  • Added --patience for early stopping
  • Added --shorten-method=[none|truncate|random_crop] to language modeling (and other) tasks
  • Added --eval-bleu for computing BLEU scores during training (60fbf64f302a825eee77637a0b7de54fde38fb2c)
  • Added support for training huggingface models (e.g. hf_gpt2) (2728f9b06d9a3808cc7ebc2afa1401eddef35e35)
  • Added FusedLAMB optimizer (--optimizer=lamb) (f75411af2690a54a5155871f3cf7ca1f6fa15391)
  • Added LSTM-based language model (lstm_lm) (9f4256edf60554afbcaadfa114525978c141f2bd)
  • Added dummy tasks and models for benchmarking (91f05347906e80e6705c141d4c9eb7398969a709; a541b19d853cf4a5209d3b8f77d5d1261554a1d9)
  • Added tutorial and pretrained models for paraphrasing (630701eaa750efda4f7aeb1a6d693eb5e690cab1)
  • Support quantization for Transformer (6379573c9e56620b6b4ddeb114b030a0568ce7fe)
  • Support multi-GPU validation in fairseq-validate (2f7e3f33235b787de2e34123d25f659e34a21558)
  • Support batched inference in hub interface (3b53962cd7a42d08bcc7c07f4f858b55bf9bbdad)
  • Support for language model fusion in standard beam search (5379461e613263911050a860b79accdf4d75fd37)

Breaking changes:

  • Updated requirements to Python 3.6+ and PyTorch 1.5+
  • --max-sentences renamed to --batch-size
  • Main entry point scripts (eval_lm.py, generate.py, etc.) removed from root directory into fairseq_cli
  • Changed format for generation output; H- now corresponds to tokenized system outputs and newly added D- lines correspond to detokenized outputs (f353913420b6ef8a31ecc55d2ec0c988178698e0)
  • We now log the stats from the log-interval (displayed as train_inner) instead of a rolling average over each epoch.
  • SequenceGenerator/Scorer does not print alignment by default, re-enable with --print-alignment
  • Print base 2 scores in generation scripts (660d69fd2bdc4c3468df7eb26b3bbd293c793f94)
  • Incremental decoding interface changed to use FairseqIncrementalState (4e48c4ae5da48a5f70c969c16793e55e12db3c81; 88185fcc3f32bd24f65875bd841166daa66ed301)
  • Refactor namespaces in Criterions to support library usage (introduce LegacyFairseqCriterion for BC) (46b773a393c423f653887c382e4d55e69627454d)
  • Deprecate FairseqCriterion::aggregate_logging_outputs interface, use FairseqCriterion::reduce_metrics instead (86793391e38bf88c119699bfb1993cb0a7a33968)
  • Moved fairseq.meters to fairseq.logging.meters and added new metrics aggregation module (fairseq.logging.metrics) (1e324a5bbe4b1f68f9dadf3592dab58a54a800a8; f8b795f427a39c19a6b7245be240680617156948)
  • Reset mid-epoch stats every log-interval steps (244835d811c2c66b1de2c5e86532bac41b154c1a)
  • Ignore duplicate entries in dictionary files (dict.txt) and support manual overwrite with #fairseq:overwrite option (dd1298e15fdbfc0c3639906eee9934968d63fc29; 937535dba036dc3759a5334ab5b8110febbe8e6e)
  • Use 1-based indexing for epochs everywhere (aa79bb9c37b27e3f84e7a4e182175d3b50a79041)

Minor interface changes:

  • Added FairseqTask::begin_epoch hook (122fc1db49534a5ca295fcae1b362bbd6308c32f)
  • FairseqTask::build_generator interface changed (cd2555a429b5f17bc47260ac1aa61068d9a43db8)
  • Change RobertaModel base class to FairseqEncoder (307df5604131dc2b93cc0a08f7c98adbfae9d268)
  • Expose FairseqOptimizer.param_groups property (8340b2d78f2b40bc365862b24477a0190ad2e2c2)
  • Deprecate --fast-stat-sync and replace with FairseqCriterion::logging_outputs_can_be_summed interface (fe6c2edad0c1f9130847b9a19fbbef169529b500)
  • --raw-text and --lazy-load are fully deprecated; use --dataset-impl instead
  • Mixture of expert tasks moved to examples/ (8845dcf5ff43ca4d3e733ade62ceca52f1f1d634)

Performance improvements:

  • Use cross entropy from apex for improved memory efficiency (5065077dfc1ec4da5246a6103858641bfe3c39eb)
  • Added buffered dataloading (--data-buffer-size) (411531734df8c7294e82c68e9d42177382f362ef)

v0.9.0

4 years ago

Possibly breaking changes:

  • Set global numpy seed (4a7cd58)
  • Split in_proj_weight into separate k, v, q projections in MultiheadAttention (fdf4c3e)
  • TransformerEncoder returns namedtuples instead of dict (27568a7)

New features:

  • Add --fast-stat-sync option (e1ba32a)
  • Add --empty-cache-freq option (315c463)
  • Support criterions with parameters (ba5f829)

New papers:

  • Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9)
  • Levenshtein Transformer (86857a5, ...)
  • Cross+Self-Attention for Transformer Models (4ac2c5f)
  • Jointly Learning to Align and Translate with Transformer Models (1c66792)
  • Reducing Transformer Depth on Demand with Structured Dropout (dabbef4)
  • Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea)
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda)
  • CamemBERT: a French BERT (b31849a)

Speed improvements:

  • Add CUDA kernels for LightConv and DynamicConv (f840564)
  • Cythonization of various dataloading components (4fc3953, ...)
  • Don't project mask tokens for MLM training (718677e)

v0.8.0

4 years ago

Changelog:

  • Relicensed under MIT license
  • Add RoBERTa
  • Add wav2vec
  • Add WMT'19 models
  • Add initial ASR code
  • Changed torch.hub interface (generate renamed to translate)
  • Add --tokenizer and --bpe
  • f812e52: Renamed data.transforms -> data.encoders
  • 654affc: New Dataset API (optional)
  • 47fd985: Deprecate old Masked LM components
  • 5f78106: Set mmap as default dataset format and infer format automatically
  • Misc fixes for sampling
  • Misc fixes to support PyTorch 1.2

v0.7.2

4 years ago

No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.

v0.7.1

4 years ago

Changelog:

  • 9462a81: Enhanced MMapIndexedDataset: less memory, higher speed
  • 392fce8: Add code for wav2vec paper