💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Reworks the release pipeline. Other breaking changes are mostly related to https://github.com/huggingface/tokenizers/pull/1335, where AddedToken is reworked
expect()
for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316
safetensors
. + Rewritten node bindings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1331
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.14.0.rc1
Mostly checking the new release scripts actually work.
expect()
for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.13.4.rc3
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc1...v0.13.4.rc2
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4-rc2...v0.13.4.rc1
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195
normalizers.Prepend
(To be used instead of Metaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195
normalizers.Prepend
(To be used instead of Metaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195
normalizers.Prepend
(To be used instead of Metaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.2...v0.13.3
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195
normalizers.Prepend
(To be used instead of Metaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195
normalizers.Prepend
(To be used instead of Metaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195
normalizers.Prepend
(To be used instead of Metaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195
normalizers.Prepend
(To be used instead of Metaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1