Fairseq Versions Save

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

4 years ago

Notable (possibly breaking) changes:

d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
f2563c2: Move LM definitions into separate files
dffb167: Updates to model API:
- FairseqModel -> FairseqEncoderDecoderModel
- add FairseqDecoder.extract_features and FairseqDecoder.output_layer
- encoder_out_dict -> encoder_out
- rm unused remove_head functions
34726d5: Move distributed_init into DistributedFairseqModel
cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
d45db80: Change default LR scheduler from reduce_lr_on_plateau to fixed
96ac28d: Rename --sampling-temperature -> --temperature
fc1a19a: Deprecate dummy batches
a1c997b: Add memory mapped datasets
0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"

Plus many additional features and bugfixes

5 years ago

Changelog:

5 years ago

Bumping version number for PyPI release.

5 years ago

Changelog:

4908863: Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0
- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
1c56b58: parallelize preprocessing
Misc bug fixes and features

5 years ago

5 years ago