Text Versions Save

Models, data loaders and abstractions for language processing, powered by PyTorch

v0.18.0

3 days ago

Warning: TorchText development is stopped and the 0.18 release will be the last stable release of the library.

This release is compatible with PyTorch 2.3.0 patch release. There are no new features added.

v0.17.2

4 weeks ago

This release is compatible with PyTorch 2.2.2 patch release. There are no new features added.

v0.17.1

2 months ago

This release is compatible with PyTorch 2.2.1 patch release. There are no new features added.

v0.17.0

2 months ago

This release is compatible with PyTorch PyTorch 2.2.0. There are no new features added.

v0.16.2

4 months ago

This is a patch release, which is compatible with PyTorch 2.1.2. There are no new features added.

v0.16.1

5 months ago

This is a patch release, which is compatible with PyTorch 2.1.1. There are no new features added.

v0.16.0

6 months ago

Current status

As of September 2023 we have paused active development of TorchText because our focus has shifted away from building out this library offering. We will continue to release new versions but do not anticipate any new feature development as we figure out future investments in this space.

Bug Fixes

  • Update links to multi30k dataset since original servers are down (#2194)
  • Use filelock to block on concurrent model downloads (#2166)

New Features

  • Add support for __contains__ for Vectors class (#2144)
  • Add generation utility support to T5Bundle (#2146)
  • Add option to ignore UTF-8 decoding error to scripted tokenizer (#2134)
  • Add shift-right method to T5 model (#2131)
  • Add XLMR and RoBERTa transforms as factory functions (#2102)
  • Make sure to include padding mask in generation (#2096)
  • (Prototype) Add top-p and top-k sampling (#2137)

v0.15.2

11 months ago

This is a minor release, which is compatible with PyTorch 2.0.1. There are no new features added.

v0.15.1

1 year ago

Highlights

In this release, we add a new model architecture along with pre-trained weights, increase flexibility in our tokenizers, and improve the overall stability of the library.

  • Added T5 & Flan-T5 model architecture with pre-trained weights
  • Added DistilRoBERTa
  • Added tutorial showing T5 in action
  • Added prototype GenerationUtils

Models

Torchtext expanded its models to include both T5, Flan-T5 and DistilRoBERTa along with the corresponding pre-trained model weights. These additions represent both the smallest and largest models available in Torchtext to date as well as the first encoder/decoder model with T5. As usual, all models are Torchscriptable.

Utils

Since TorchText now has encoder/decoder models available, we prototyped a GenerationUtils for generic decoding capabilities for encoder/decoder or decoder only models.

Improvements

Features

  • Add DistilRoBERTa to OSS (#1998)
  • Beginning of GenerationUtils (#2011)
  • Add Flan-T5 architecture (#2027)
  • Optimize T5 for sequence generation (#2054)
  • Add bundles for FLAN-T5 (#2061)
  • Promote T5 and variants (#2064)
  • Fixup generation utils for prototype release (#2065)

CI (Migrate from CircleCI to Github Actions)

  • Remove CUDA binary builds (#1994)
  • Remove Linux and MacOS unit tests from CircleCI (#1993)
  • Validate binaries for nightly/release testing (#2010)
  • Rename variable to avoid conflict with PIP system variable PIP_PREFIX (#2015, #2016)
  • Refactor validation using MATRIX vars (#2021)
  • Migrate validation workflows to test-infra (#2022)
  • 3.11 Windows Wheels Support in CircleCI (#2053)
  • Adding RC triggers for all build jobs (#2057)
  • Add windows 3.11 conda (#2063)
  • Channel=test for build matrix generation (#2066)
  • Turn off CirclCI 3.11 unit tests (#2078)
  • Fix validation workflow for test channel (#2071)
  • Modify integration test workflow to use PyTorch generic CI job (#2051)

Bug Fixes

  • Change read_from_tar call to load_from_tar (#1997)
  • Update Multi30k test dataset hash (#2003)
  • Fix device setting for T5 Model (#2007)
  • Fix overwite typo (#2006)
  • Fix linting error (#2019)
  • Fix memory leak with C++ RegEx operator (#2024)
  • Fix CodeQL workflow failure (#2046)
  • Fix UTF8 decoding error in GPT2BPETokenizer decode method (#2092)

Examples

  • Update T5 tutorial for 2.0 release (#2080)

Documentation

  • Added min version req + readme instructions for torchdata (#2048)
  • Update README w/ 3.11 (#2062)

Testing

  • Replaced tabs w/ spaces to fix CodeMod (#1999)
  • Add GPU testing for RoBERTa models (#2025)
  • Add TorchData version to smoke tests (#2034)
  • Update integration-test.yml (#2038)
  • Update CUDA version on GPU test (#2040)
  • Add prototype GPU tests for T5 (#2055)
  • Install portalocker for testing (#2056)
  • Test newly uploaded Flan-T5 weights (#2074)

Dependencies

  • Add TorchData as a hard dependency (#1985)

Others

  • Drop support for Python 3.7 (#2037)
  • Add logo (#2050)
  • Version Bumps and Update channels (#2067)

v0.14.1

1 year ago

This is a minor release, which is compatible with PyTorch 1.13.1. There are no new features added.