Neural machine translation and sequence learning using TensorFlow
--jit_compile
to compile the model with XLA (only applied in training at the moment)ctranslate2_spec
has been removed as it is no longer relevant with the new CTranslate2 convertertf.keras.optimizers.legacy
for now)pad_to_bucket_boundary
to pad the batch length to a multiple of length_bucket_width
(this is useful to reduce the number of recompilation with XLA)chrf
and chrf++
from SacreBLEUScalingNmtEnDe
and ScalingNmtEnFr
from Ott et al. 2018
EmbeddingsSharingLevel.AUTO
to automatically share embeddings when the vocabulary is sharedRunner.average_checkpoints
to accept a list of checkpoints to averageinitial_learning_rate
parameter to the InvSqrtDecay
scheduleTransformer
constructor:
mha_bias
: to disable bias terms in the multi-head attention (as presented in the original paper)output_layer_bias
: to disable bias in the output linear layerSequenceRecordInputter
length vectordistutils.version.LooseVersion
, use packaging.version.Version
instead