Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
--end-of-prepending-tag
for training or data preparation,
and --transformer-block-prepended-cross-attention
for training.bfloat16
integration and system testing on all platforms.--dtype bfloat16
to sockeye-translate
, sockeye-score
, and sockeye-quantize
.numpy==1.24.0
by using pickle
instead of numpy
to save/load ParallelSampleIter
data permutations.sockeye-evaluate
no longer applies text tokenization for TER (same behavior as other metrics).sockeye
modules except test_utils
and addressed resulting type issues.sockeye-generate-decoder-states -m [model] --source [src] --target [tgt] --output-dir [output dir]
sockeye-knn -i [input_dir] -o [output_dir] -t [faiss_index_signature]
where input_dir
is the same as output_dir
from the sockeye-generate-decoder-states
command.sockeye-translate ... --knn-index [index_dir] --knn-lambda [interpolation_weight]
where index_dir
is the same as output_dir
from the sockeye-knn
command.torch.testing.assert_allclose
with torch.testing.close
for PyTorch 1.14 compatibility.--tf32 0|1
bool device (torch.backends.cuda.matmul.allow_tf32
)
enabling 10-bit precision (19 bit total) transparent float32
acceleration. default true for backward compat with torch < 1.12.
allow different --tf32
training continuationdevice.init_device()
called by train, translate, and scorepip install deepspeed
deepspeed --no_python ... sockeye-train ...
--deepspeed-fp16
or BF16 mode with --deepspeed-bf16
.--learning-rate-t-scale
.sockeye-train
and sockeye-translate
option --clamp-to-dtype
that clamps outputs of transformer attention, feed-forward networks, and process blocks to the min/max finite values for the current dtype. This can prevent inf/nan values from overflow when running large models in float16 mode. See: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139
sockeye-quantize
.
isometric-ratio, isometric-diff, isometric-lc
) when specifying --metric
.--output-best-non-blank
to output non-blank best hypothesis from the nbest list.--neural-vocab-selection
to sockeye-train
. This will train a model with Neural Vocabulary Selection that is automatically used by sockeye-translate
. If you want look at translations without vocabulary selection specify --skip-nvs
as an argument to sockeye-translate
.sockeye-train
argument --no-reload-on-learning-rate-reduce
that disables reloading the best training checkpoint when reducing the learning rate. This currently only applies to the plateau-reduce
learning rate scheduler since other schedulers do not reload checkpoints.batch_size
in Translator code.inference_only
is set, including for the CheckpointDecoder during training.