Tokenizers Versions Save

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

v0.19.1

2 weeks ago

What's Changed

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.19.0...v0.19.1

v0.19.0

2 weeks ago

What's Changed

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.15.2...v0.19.0

v0.19.0rc0

2 weeks ago

Bumping 3 versions because of this: https://github.com/huggingface/transformers/blob/60dea593edd0b94ee15dc3917900b26e3acfbbee/setup.py#L177

What's Changed

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.15.2...v0.19.0rc0

v0.15.2

2 months ago

What's Changed

Big shoutout to @rlrs for the fast replace normalizers PR. This boosts the performances of the tokenizers: image

New Contributors

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.15.1...v0.15.2rc1

v0.15.1

3 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.15.0...v0.15.1

v0.15.1.rc0

3 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.15.1.rc0

v0.15.0

5 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.14.1...v0.15.0

v0.14.1

6 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.3...v0.14.1

v0.14.1rc1

7 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.14.1rc1

v0.14.0

7 months ago

⚠️ Reworks the release pipeline. Other breaking changes ⚠️ :

  • #1335, AddedToken is reworked, is_special_token rename to special for consistency
  • feature http is now OFF by default, and depends on hf-hub instead of cached_path (updated cache directory, better sync implementation)
  • Removed SSL link on the python package, calling huggingface_hub directly instead.
  • New dependency : huggingface_hub (while we deprecate Tokenizer.from_pretrained(...) to Tokenizer.from_file(hugginngface_hub.hf_hub_download(MODEL_ID, "tokenizer.json")

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.3...v0.14.0