Sru Versions Save

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

v2.7.0-rc1

2 years ago

Postponed the CUDA initialization to the instantiation of SRUCells in order to ensure that the process where it takes place is where we plan to run our model.

3.0.0.dev6

2 years ago

More layer norm options; More info in _repr_()

v2.6.0

3 years ago
  • Support GPU/CUDA inference in Torchscript model
  • Support post layer norm
  • Support custom init value for weight_c
  • Add unit tests for GPU inference
  • Add unit tests for backward()
  • Add more unit tests for Torchscript

2.6.0.dev3

3 years ago
  • Support GPU/CUDA inference in torchscript model
  • Support post layer norm
  • Support custom init value for weight_c

v2.6.0.dev2

3 years ago

Dev1:

  • Support GPU/CUDA inference in torchscript model
  • Support post layer norm
  • Support custom init value for weight_c Dev2:
  • Fix an issue

v2.6.0.dev

3 years ago
  • Support GPU/CUDA inference in torchscript model
  • Support post layer norm
  • Support custom init value for weight_c

v3.0.0.dev3

3 years ago

Fix a typo. Add an option to only use attention_last_n_layers. Replace option normalize_after with normalization_type

v3.0.0.dev2

3 years ago

Changes:

  • change weight_c_init from Optional[float] = None to float = 1.0

Bug fixes:

  • fix a potential memory leak in custom op
  • fix bug in cuda maskpad
  • torchscript compatible in torch 1.5.1 now

v3.0.0.dev1

3 years ago

Note that future 3.0.0 release, and future 3.0.0 dev releases might not be backwards compatible with this dev release.

Key features / changes:

  • #160: SRU++ is now available. Unit tests are included for torchscript compatibility and correctness. Example language model training code is available.
  • #166: fp16 training improvement. The recurrence kernel will run in float16 now when amp is enabled. This gives an additional ~10% speedup on tested language model training, ~20% reduction on GPU memory usage and no regression on final results.
  • #167: Code clean-up. No autocast block needed in sru.ops.elementwise_recurrence_gpu. This would allow both Native AMP and APEX AMP to work. (Credit: @visionscaper)

Other changes:

  • Fix an dtype error within adaptive embedding (#168)
  • Significant speed-up on BILLONWORD training (#169)
  • LICENCE update requested by IPC (#165)

v3.0.0.dev0

3 years ago

Note that future release and dev releases of v3 might be backwards incompatible with this dev release.

This dev release:

  • custom_m renamed to transform_module
  • transform_module always used now (the weight and weight_proj parameters have been removed)
  • projection_size can take in a sequence of projection_sizes, one per layer
  • n_proj in SRUCell renamed to projection_size, for consistency