We add components for travis ci. Now all tests can only run on CPU.
We combine functions about logging into a standalone module. Now we can redirect logging info to files when using tqdm
at the same time.
This feature enables you to use different way to batch your data. We provide two method, "samples" and "tokens". "samples" means how many bi-text pairs (samples) in one batch, while "tokens" means how many tokens in one batch (if there are several sentences in one sample, this means most tokens among them). You can use these two kinds of method by setting "batching_key" as "samples" or "tokens".
This feature enables you to emuluate multi GPUs on a single GPU. By setting update_cycle
as a value larger than 1, the model will compute forward and accumulate gradients for this many steps before parameters update, which behaves like one update step with actual batch size as update_cycle * batch_size
. For example, if we want to use 25000 tokens in a batch on a single 1080 GPU(8GB Mem), we can set batch_size
as 1250 and update_cycle
as 20. This will prevent OOM problem.
multi-bleu.perl
and multi-bleu-detok.perl
.AdamW
and Adafactor
dim_per_head
option for transformer. Now dim_per_head
* n_head
can not equal to d_model
.Criterion
This is the final version using Pytorch 0.3.1. We will only provide minimum maintenance and bug fix.