Fairseq Versions Save

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

v0.7.0

4 years ago

Notable (possibly breaking) changes:

  • d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
  • f2563c2: Move LM definitions into separate files
  • dffb167: Updates to model API:
    • FairseqModel -> FairseqEncoderDecoderModel
    • add FairseqDecoder.extract_features and FairseqDecoder.output_layer
    • encoder_out_dict -> encoder_out
    • rm unused remove_head functions
  • 34726d5: Move distributed_init into DistributedFairseqModel
  • cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
  • d45db80: Change default LR scheduler from reduce_lr_on_plateau to fixed
  • 96ac28d: Rename --sampling-temperature -> --temperature
  • fc1a19a: Deprecate dummy batches
  • a1c997b: Add memory mapped datasets
  • 0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"

Plus many additional features and bugfixes

v0.6.2

5 years ago

Changelog:

  • 998ba4f: Add language models from Baevski & Auli (2018)
  • 4294c4f: Add mixture of experts code from Shen et al. (2019)
  • 0049349: Add example for multilingual training
  • 48d9afb: Speed improvements, including fused operators from apex
  • 44d27e6: Add Tensorboard support
  • d17fa85: Add Adadelta optimizer
  • 9e1c880: Add FairseqEncoderModel
  • b65c579: Add FairseqTask.inference_step to modularize generate.py
  • 2ad1178: Add back --curriculum
  • Misc bug fixes and other features

v0.6.1

5 years ago

Bumping version number for PyPI release.

v0.6.0

5 years ago

Changelog:

  • 4908863: Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0
    • no more FP16Trainer, we just have an FP16Optimizer wrapper
    • most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
    • Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
    • Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
  • 1c56b58: parallelize preprocessing
  • Misc bug fixes and features

v0.5.0

5 years ago

v0.4.0

5 years ago