Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)
Postponed the CUDA initialization to the instantiation of SRUCells in order to ensure that the process where it takes place is where we plan to run our model.
More layer norm options;
More info in _repr_()
Dev1:
Fix a typo. Add an option to only use attention_last_n_layers
. Replace option normalize_after
with normalization_type
Changes:
weight_c_init
from Optional[float] = None
to float = 1.0
Bug fixes:
Note that future 3.0.0 release, and future 3.0.0 dev releases might not be backwards compatible with this dev release.
autocast
block needed in sru.ops.elementwise_recurrence_gpu
. This would allow both Native AMP and APEX AMP to work. (Credit: @visionscaper)Note that future release and dev releases of v3 might be backwards incompatible with this dev release.
This dev release:
custom_m
renamed to transform_module
transform_module
always used now (the weight
and weight_proj
parameters have been removed)projection_size
can take in a sequence of projection_sizes, one per layern_proj
in SRUCell
renamed to projection_size
, for consistency