Sequence-to-Sequence Framework in PyTorch
This release supports Pytorch >= 0.4.1 including the recent 1.0 release. The relevant
setup.py
and environment.yml
files will default to 1.0.0
installation.
NumpyDataset
now returns tensors of shape HxW, N, C
for 3D/4D convolutional features, 1, N, C
for 2D feature files. Models should be adjusted to adapt to this new shaping.order_file
per split (ord: path/to/txt file with integer per line
) can be given from the configurations to change the feature order of numpy tensors to flexibly revert, shuffle, tile, etc. them.LabelDataset
for single label input/outputs with associated Vocabulary
for integer mapping.handle_oom=(True|False)
argument for [train]
section to recover from GPU out-of-memory (OOM) errors during training. This is disabled by default, you need to enable it from the experiment configuration file. Note that it is still possible to get an OOM during validation perplexity computation. If you hit that, reduce the eval_batch_size
parameter.de-hyphen
post-processing filter to stitch back the aggressive hyphen splitting of Moses during early-stopping evaluations.TextEncoder
.enc_lnorm, sched_sampling
options to NMT
to enable layer normalization for encoder and use scheduled sampling at a given probability.ConditionalDecoder
can now be initialized with max-pooled encoder states or the last state as well.NMT
by changing the dec_variant
option.self.history
dictionary of the decoders.nmtpy translate
with the argument -N
.-S
works for nmtpy translate
. Now you need to give the split name with -s
all the time but -S
is used to override the input data sources defined for that split in the configuration file.MNMTDecInit
. Same functionality exists within the NMT
model by using the model option dec_init=feats
.trgmul
.direction
should be defined as direction: pre:Text, hyp:Text -> lb:Label
pre, hyp
and lb
keys point to plain text files with one sentence per line. A vocabulary should be constructed even for the labels to fit the nmtpy architecture.acc
should be added to eval_metrics
to compute accuracy.pip
.lr_decay*
options in config.py
.nmtpy-install-extra
should
be launched after installation.translate
and training
regimes.BucketBatchSampler
, i.e. length-ordered batches.environment.yml
files for easy installation using conda
. You can now
create a ready-to-use conda
environment by just calling conda env create -f environment-cuda<VER>.yml
.NumpyDataset
memory efficient by keeping float16
arrays as they are
until batch creation time.Multi30kRawDataset
to Multi30kDataset
which now supports both
raw image files and pre-extracted visual features file stored as .npy
.scripts/
.ShowAttendAndTell
and multimodal NMT.MNMTDecinit
to initialize decoder with auxiliary features.AMNMTFeatures
which is the attentive MMT but with features file
instead of end-to-end feature extraction which was memory hungry.Updates for ShowAttendAndTell
model.
Multi30kDataset
.ShowAttendAndTell
model. It should now work.Multi30kRawDataset
for training end-to-end systems from raw images as input.NumpyDataset
to read .npy/.npz
tensor files as input features.-S
to nmtpy train
to produce shorter experiment files with not all the hyperparameters in file name.de-spm
for Google SentencePiece (SPM) processed files.sacrebleu
is now a dependency as it is now accepted as an early-stopping metric.
It only makes sense to use it with SPM processed files since they are detokenized
once post-processed.sklearn
as a dependency for some metrics.momentum
and nesterov
parameters to [train]
section for SGD.ImageEncoder
layer is improved in many ways. Please see the code for further details.ModuleDict()
support.METEOR
will now fallback to English if language can not be detected from file suffixes.-f
now produces a separate numpy file for token frequencies when building vocabulary files with nmtpy-build-vocab
.nmtpy test
for non beam-search inference modes.nmtpy resume
command and added pretrained_file
option for [train]
to initialize model weights from a checkpoint.freeze_layers
option for [train]
to give comma-separated list of layer name prefixes to freeze.SimpleGRUDecoder
layer.TextEncoder
: Ability to set maxnorm
and gradscale
of embeddings and work with or without sorted-length batches.ConditionalDecoder
: Make it work with GRU/LSTM, allow setting maxnorm/gradscale
for embeddings.ConditionalMMDecoder
: Same as above.--avoid-double
and --avoid-unk
removed for now.--lp-alpha
.utils/ml_metrics.py
:
lrap
$HOME
and $USER
in your configuration files.utils.nn.get_network_topology()
with a new Topology
class that will parse the direction
string of the model in a more smart way.CUDA_VISIBLE_DEVICES
is set, the GPUManager
will always honor it./tmp
for GPU reservation.TextDataset
for standalone text file reading.OneHotDataset
, a variant of TextDataset
where the sequences are not prefixed/suffixed with <bos>
and <eos>
respectively.MultiParallelDataset
that merges an arbitrary number of parallel datasets together..nodbl
and .nounk
suffixes are now added to output files for --avoid-double
and --avoid-unk
arguments respectively.beam_search()
is now separated out into its own file nmtpytorch/search.py
.max_len
default is increased to 200.Multi30kDataset
and ImageFolderDataset
classestorchvision
dependency added for CNN supportnmtpy-coco-metrics
now computes one METEOR without norm=True
[train]
section:
patience_delta
option is removedeval_batch_size
to define batch size for GPU beam-search during trainingeval_freq
default is now 3000
which means per 3000
minibatcheseval_metrics
now defaults to loss
. As before, you can provide a list
of metrics like bleu,meteor,loss
to compute all of them and early-stop
based on the firsteval_zero (default: False)
which tells to evaluate the model
once on dev set right before the training starts. Useful for sanity
checking if you fine-tune a model initialized with pre-trained weightssave_best_n
: we no longer save the best N
models on dev set
w.r.t. early-stopping metricsave_best_metrics (default: True)
which will save best models
on dev set w.r.t each metric provided in eval_metrics
. This kind of
remedies the removal of save_best_n
checkpoint_freq
now to defaults to 5000
which means per 5000
minibatches.n_checkpoints (default: 5)
to define the number of last
checkpoints that will be kept if checkpoint_freq > 0
i.e. checkpointing enabledExtendedInterpolation
support to configuration files:
.conf
files to avoid
typing same paths again and again. A variable can be referenced
from within its section using tensorboard_dir: ${save_path}/tb
notation
Cross-section references are also possible: ${data:root}
will be replaced
by the value of the root
variable defined in the [data]
section.-p/--pretrained
to nmtpy train
to initialize the weights of
the model using another checkpoint .ckpt
.nmtpy translate
:
-s
accepts a comma-separated test sets defined in the configuration
file of the experiment to translate them at once. Example: -s val,newstest2016,newstest2017
-s
is -S
which receives a
single input file of source sentences.-o
.
In the case of multiple test sets, the output prefix will be appended
the name of the test set and the beam size. If you just provide a single file with -S
the final output name will only reflect the beam size information.nmtpy-build-vocab
:
-f
: Stores frequency counts as well inside the final json
vocabulary-x
: Does not add special markers <eos>,<bos>,<unk>,<pad>
into the vocabularyFusion()
layer to concat,sum,mul
an arbitrary number of inputsImageEncoder()
layer to seamlessly plug a VGG or ResNet
CNN using torchvision
pretrained modelsAttention
layer arguments improved. You can now select the bottleneck
dimensionality for MLP attention with att_bottleneck
. The dot
attention is still not tested and probably broken.dec_init
defaults to mean_ctx
, i.e. the decoder will be initialized
with the mean context computed from the source encoderenc_lnorm
which was just a placeholder is now removed since we do not
provided layer-normalization for now