Transformers Versions Save

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

v4.33.3

7 months ago

A patch release was made for the following three commits:

  • DeepSpeed ZeRO-3 handling when resizing embedding layers (#26259)
  • [doc] Always call it Agents for consistency (#25958)
  • deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler (#25863)

v4.33.2

8 months ago

A patch release was done for these two commits:

  • Fix pad to multiple of (#25732)
  • fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 (#26024)

v4.33.1

8 months ago

Falcon

Falcon is a class of causal decoder-only models built by TII. The largest Falcon checkpoints have been trained on >=1T tokens of text, with a particular emphasis on the RefinedWeb corpus. They are made available under the Apache 2.0 license.

Falcon’s architecture is modern and optimized for inference, with multi-query attention and support for efficient attention variants like FlashAttention. Both β€˜base’ models trained only as causal language models as well as β€˜instruct’ models that have received further fine-tuning are available.

  • Falcon port #24523 by @Rocketknight1
  • Falcon: Add RoPE scaling by @gante in #25878
  • Add proper Falcon docs and conversion script by @Rocketknight1 in #25954
  • Put Falcon back by @LysandreJik in #25960
  • [Falcon] Remove SDPA for falcon to support earlier versions of PyTorch (< 2.0) by @younesbelkada in #25947

Code Llama

Code Llama, is a family of large language models for code based on Llama 2, providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

  • [CodeLlama] Add support for CodeLlama by @ArthurZucker in #25740
  • [CodeLlama] Fix CI by @ArthurZucker in #25890

ViTDet

ViTDet reuses the ViT model architecture, adapted to object detection.

  • Add ViTDet by @NielsRogge in #25524

DINO v2

DINO v2 is the next iteration of the DINO model. It is added as a backbone class, allowing it to be re-used in downstream models.

  • [DINOv2] Add backbone class by @NielsRogge in #25520

VITS

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.

  • add VITS model by @hollance in #24085

Breaking changes:

  • 🚨🚨🚨 [Refactor] Move third-party related utility files into integrations/ folder 🚨🚨🚨 by @younesbelkada in #25599

Moves all third party libs (outside HF ecosystem) related utility files inside integrations/ instead of having them in transformers directly.

In order to get the previous usage you should be changing your call to the following:

- from transformers.deepspeed import HfDeepSpeedConfig
+ from transformers.integrations import HfDeepSpeedConfig

Bugfixes and improvements

  • [DOCS] MusicGen Docs Update by @xNul in #25510
  • [MINOR:TYPO] by @cakiki in #25646
  • Pass the proper token to PEFT integration in auto classes by @sgugger in #25649
  • Put IDEFICS in the right section of the doc by @sgugger in #25650
  • TF 2.14 compatibility by @Rocketknight1 in #25630
  • Fix bloom add prefix space by @ArthurZucker in #25652
  • removing unnecesssary extra parameter by @rafaelpadilla in #25643
  • Adds TRANSFORMERS_TEST_BACKEND by @vvvm23 in #25655
  • stringify config by @AleksanderWWW in #25637
  • Add input_embeds functionality to gpt_neo Causal LM by @gaasher in #25659
  • Update doc toctree by @ydshieh in #25661
  • Add Llama2 resources by @wonhyeongseo in #25531
  • [SPM] Patch spm Llama and T5 by @ArthurZucker in #25656
  • [GPTNeo] Add input_embeds functionality to gpt_neo Causal LM by @ArthurZucker in #25664
  • fix wrong path in some doc by @ydshieh in #25658
  • Remove utils/documentation_tests.txt by @ydshieh in #25680
  • Prevent Dynamo graph fragmentation in GPTNeoX with torch.baddbmm fix by @norabelrose in #24941
  • ⚠️ [CLAP] Fix dtype of logit scales in init by @sanchit-gandhi in #25682
  • Sets the stalebot to 10 AM CEST by @LysandreJik in #25678
  • Fix pad_token check condition by @ydshieh in #25685
  • [DOCS] Added docstring example for EpsilonLogitsWarper #24783 by @sanjeevk-os in #25378
  • correct resume training steps number in progress bar by @pphuc25 in #25691
  • Generate: general test for decoder-only generation from inputs_embeds by @gante in #25687
  • Fix typo in configuration_gpt2.py by @susnato in #25676
  • fix ram efficient fsdp init by @pacman100 in #25686
  • [LlamaTokenizer] make unk_token_length a property by @ArthurZucker in #25689
  • Update list of persons to tag by @sgugger in #25708
  • docs: Resolve typos in warning text by @tomaarsen in #25711
  • Fix failing test_batch_generation for bloom by @ydshieh in #25718
  • [PEFT] Fix peft version by @younesbelkada in #25710
  • Fix number of minimal calls to the Hub with peft integration by @sgugger in #25715
  • [AutoGPTQ] Add correct installation of GPTQ library + fix slow tests by @younesbelkada in #25713
  • Generate: nudge towards do_sample=False when temperature=0.0 by @gante in #25722
  • [from_pretrained] Simpler code for peft by @ArthurZucker in #25726
  • [idefics] idefics-9b test use 4bit quant by @stas00 in #25734
  • ImageProcessor - check if input pixel values between 0-255 by @amyeroberts in #25688
  • [from_pretrained] Fix failing PEFT tests by @younesbelkada in #25733
  • [ASR Pipe Test] Fix CTC timestamps error message by @sanchit-gandhi in #25727
  • 🌐 [i18n-KO] Translated visual_question_answering.md to Korean by @wonhyeongseo in #25679
  • [PEFT] Fix PeftConfig save pretrained when calling add_adapter by @younesbelkada in #25738
  • fixed typo in speech encoder decoder doc by @asusevski in #25745
  • Add FlaxCLIPTextModelWithProjection by @pcuenca in #25254
  • Generate: add missing logits processors docs by @gante in #25653
  • [DOCS] Add example for HammingDiversityLogitsProcessor by @jessthebp in #25481
  • Generate: logits processors are doctested and fix broken doctests by @gante in #25692
  • [CLAP] Fix logit scales dtype for fp16 by @sanchit-gandhi in #25754
  • [Sentencepiece] make sure legacy do not require protobuf by @ArthurZucker in #25684
  • fix encoder hook by @SunMarc in #25735
  • Docs: fix indentation in HammingDiversityLogitsProcessor by @gante in #25756
  • Add type hints for several pytorch models (batch-3) by @nablabits in #25705
  • Correct attention mask dtype for Flax GPT2 by @liutianlin0121 in #25636
  • fix a typo in docsting by @statelesshz in #25759
  • [idefics] small fixes by @stas00 in #25764
  • Add docstrings and fix VIVIT examples by @Geometrein in #25628
  • [LlamaFamiliy] add a tip about dtype by @ArthurZucker in #25794
  • Add type hints for several pytorch models (batch-2) by @nablabits in #25557
  • Add type hints for pytorch models (final batch) by @nablabits in #25750
  • Add type hints for several pytorch models (batch-4) by @nablabits in #25749
  • [idefics] fix vision's hidden_act by @stas00 in #25787
  • Arde/fsdp activation checkpointing by @arde171 in #25771
  • Fix incorrect Boolean value in deepspeed example by @tmm1 in #25788
  • fixing name position_embeddings to object_queries by @Lorenzobattistela in #24652
  • Resolving Attribute error when using the FSDP ram efficient feature by @pacman100 in #25820
  • [Docs] More clarifications on BT + FA by @younesbelkada in #25823
  • fix register by @zspo in #25779
  • Minor wording changes for Code Llama by @osanseviero in #25815
  • [LlamaTokenizer] tokenize nits. by @ArthurZucker in #25793
  • fix warning trigger for embed_positions when loading xglm by @MattYoon in #25798
  • 🌐 [i18n-KO] Translated peft.md to Korean by @nuatmochoi in #25706
  • 🌐 [i18n-KO] model_memory_anatomy.md to Korean by @mjk0618 in #25755
  • Error with checking args.eval_accumulation_steps to gather tensors by @chaumng in #25819
  • Tests: detect lines removed from "utils/not_doctested.txt" and doctest ALL generation files by @gante in #25763
  • 🌐 [i18n-KO] Translated add_new_pipeline.md to Korean by @heuristicwave in #25498
  • 🌐 [i18n-KO] TranslatedΒ community.md to Korean by @sim-so in #25674
  • 🀦update warning to If you want to use the new behaviour, set `legacy=… by @ArthurZucker in #25833
  • update remaining Pop2Piano checkpoints by @susnato in #25827
  • [AutoTokenizer] Add data2vec to mapping by @sanchit-gandhi in #25835
  • MaskFormer,Mask2former - reduce memory load by @amyeroberts in #25741
  • Support loading base64 images in pipelines by @InventivetalentDev in #25633
  • Update README.md by @NinoRisteski in #25834
  • Generate: models with custom generate() return True in can_generate() by @gante in #25838
  • Update README.md by @NinoRisteski in #25832
  • minor typo fix in PeftAdapterMixin docs by @tmm1 in #25829
  • Add flax installation in daily doctest workflow by @ydshieh in #25860
  • Add Blip2 model in VQA pipeline by @jpizarrom in #25532
  • Remote tools are turned off by @LysandreJik in #25867
  • Fix imports by @ydshieh in #25869
  • fix max_memory for bnb by @SunMarc in #25842
  • Docs: fix example failing doctest in generation_strategies.md by @gante in #25874
  • pin pandas==2.0.3 by @ydshieh in #25875
  • Reduce CI output by @ydshieh in #25876
  • [ViTDet] Fix doc tests by @NielsRogge in #25880
  • For xla tensors, use an alternative way to get a unique id by @qihqi in #25802
  • fix ds z3 checkpointing when stage3_gather_16bit_weights_on_model_save=False by @pacman100 in #25817
  • Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer by @veezbo in #25807
  • [TokenizerFast] can_save_slow_tokenizer as a property for when vocab_file's folder was removed by @ArthurZucker in #25626
  • Save image_processor while saving pipeline (ImageSegmentationPipeline) by @raghavanone in #25884
  • [InstructBlip] FINAL Fix instructblip test by @younesbelkada in #25887
  • Add type hints for tf models batch 1 by @nablabits in #25853
  • Update setup.py by @ydshieh in #25893
  • Smarter check for is_tensor by @sgugger in #25871
  • remove torch_dtype override by @SunMarc in #25894
  • fix FSDP model resume optimizer & scheduler by @pkumc in #25852
  • Better error message for pipeline loading by @ydshieh in #25912
  • Remove broken docs for MusicGen by @osanseviero in #25905
  • Revert frozen training arguments by @muellerzr in #25903
  • [VITS] Add to TTA pipeline by @sanchit-gandhi in #25906
  • [MMS] Update docs with HF TTS implementation by @sanchit-gandhi in #25907
  • [VITS] Only trigger tokenizer warning for uroman by @sanchit-gandhi in #25915
  • Update-llama-code by @ArthurZucker in #25826
  • Update model_memory_anatomy.md by @NinoRisteski in #25896
  • Skip offload tests for ViTDet by @ydshieh in #25913
  • Fix typos by @omahs in #25936
  • Update community.md by @NinoRisteski in #25928
  • Update autoclass_tutorial.md by @NinoRisteski in #25929
  • Update README.md by @NinoRisteski in #25941
  • [MMS] Fix pip install in docs by @sanchit-gandhi in #25949
  • [VITS] Handle deprecated weight norm by @sanchit-gandhi in #25946
  • Import deepspeed utilities from integrations by @osanseviero in #25919
  • Update README.md by @NinoRisteski in #25922
  • [VITS] Fix init test by @sanchit-gandhi in #25945
  • Fix failing test by @LysandreJik in #25963
  • Fix smart check by @ydshieh in #25955
  • Add type hints for tf models final batch by @nablabits in #25883

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @nablabits
    • Add type hints for several pytorch models (batch-3) (#25705)
    • Add type hints for several pytorch models (batch-2) (#25557)
    • Add type hints for pytorch models (final batch) (#25750)
    • Add type hints for several pytorch models (batch-4) (#25749)
    • Add type hints for tf models batch 1 (#25853)
    • Add type hints for tf models final batch (#25883)
  • @Lorenzobattistela
    • fixing name position_embeddings to object_queries (#24652)
  • @hollance
    • add VITS model (#24085)

v4.32.1

8 months ago

Patch release including several patches from v4.31.0, listed below:

  • Put IDEFICS in the right section of the doc (#25650)
  • removing unnecesssary extra parameter (#25643)
  • [SPM] Patch spm Llama and T5 (#25656)
  • Fix bloom add prefix space (#25652)
  • Generate: add missing logits processors docs (#25653)
  • [idefics] small fixes (#25764)

v4.32.0

8 months ago

IDEFICS

The IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

IDEFICS is the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.

Blogpost: hf.co/blog/idefics Playground: HuggingFaceM4/idefics_playground

image

  • new model: IDEFICS via HuggingFaceM4 by @stas00 in #24796

MPT

MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.

  • [MPT] Add MosaicML's MPT model to transformers by @ArthurZucker & @younesbelkada in #24629

GPTQ Integration

GPTQ quantization is now supported in Transformers, through the optimum library. The backend relies on the auto_gptq library, from which we use the GPTQ and QuantLinear classes.

See below for an example of the API, quantizing a model using the new GPTQConfig configuration utility.

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer,  group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)

Most models under TheBloke namespace with the suffix GPTQ should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ simply run (after installing latest optimum and auto-gptq libraries):

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration

  • GPTQ integration by @SunMarc in #25062

Pipelines

A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers: SpeechT5ForTextToSpeech, MusicGen and Bark.

See below for an example:

from transformers import pipeline

classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")

audio = output["audio"]
sampling_rate = output["sampling_rate"]
  • Add Text-To-Speech pipeline by @ylacombe in #24952

Classifier-Free Guidance decoding

Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.

  • add CFG for .generate() by @Vermeille in #24654

Task guides

A new task guide going into Visual Question Answering has been added to Transformers.

  • VQA task guide by @MKhalusova in #25244

Model deprecation

We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.

By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.

  • Deprecate unused OpenLlama architecture by @tomaarsen in #24922

Translation Efforts

There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.

If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.

  • 🌐 [i18n-KO] Translatedtasks/document_question_answering.md to Korean by @jungnerd in #24588
  • 🌐 [i18n-KO] Fixed Korean and English quicktour.md by @wonhyeongseo in #24664
  • 🌐 [i18n-KO] Updated Korean serialization.md by @wonhyeongseo in #24686
  • 🌐 [i18n-KO] Translated performance.md to Korean by @augustinLib in #24883
  • 🌐 [i18n-KO] Translated testing.md to Korean by @Sunmin0520 in #24900
  • 🌐 [i18n-KO] Translated perf_train_cpu.md to Korean by @seank021 in #24911
  • 🌐 [i18n-KO] Translated <tf_xla>.md to Korean by @54data in #24904
  • 🌐 [i18n-KO] Translated perf_hardware.md to Korean by @augustinLib in #24966
  • 🌐 [i18n-KO] Translated hpo_train.md to Korean by @harheem in #24968
  • 🌐 [i18n-KO] Translated perf_infer_cpu.md to Korean by @junejae in #24920
  • 🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by @kihoon71 in #24828
  • 🌐 [i18n-KO] TranslatedΒ transformers_agents.md to Korean by @sim-so in #24881
  • 🌐 [i18n-KO] Translated perf_infer_gpu_many.md to Korean by @heuristicwave in #24943
  • 🌐 [i18n-KO] Translated perf_infer_gpu_one.md to Korean by @eenzeenee in #24978
  • 🌐 [i18n-KO] Translated add_tensorflow_model.md to Korean by @keonju2 in #25017
  • 🌐 [i18n-KO] Translated perf_train_cpu_many.md to Korean by @nuatmochoi in #24923
  • 🌐 [i18n-KO] Translated add_new_model.md to Korean by @mjk0618 in #24957
  • 🌐 [i18n-KO] Translated model_summary.md to Korean by @0525hhgus in #24625
  • 🌐 [i18n-KO] Translated philosophy.md to Korean by @TaeYupNoh in #25010
  • 🌐 [i18n-KO] Translated perf_train_tpu_tf.md to Korean by @0525hhgus in #25433
  • 🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by @sronger in #24987

Explicit input data format for image processing

Addition of input_data_format argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.

import numpy as np
from transformers import ViTImageProcessor

img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")
  • Input data format by @amyeroberts in #25464
  • Add input_data_format argument, image transforms by @amyeroberts in #25462

Documentation clarification about efficient inference through torch.scaled_dot_product_attention & Flash Attention

Users are not aware that it is possible to force dispatch torch.scaled_dot_product_attention method from torch to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.

In a nutshell, one can just run:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

# convert the model to BetterTransformer
model.to_bettertransformer()

input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
    outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

to enable Flash-attenion in their model. However, this feature does not support padding yet.

FSDP and DeepSpeed Changes

Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting. Users no longer have to pass fsdp_transformer_layer_cls_to_wrap as the code now use _no_split_modules by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.

  • add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
  • fix fsdp checkpointing issues by @pacman100 in #24926
  • fsdp fixes and enhancements by @pacman100 in #24980
  • fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
  • resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
  • fix z3 init when using accelerate launcher by @pacman100 in #25589

Breaking changes

Default optimizer in the Trainer class

The default optimizer in the Trainer class has been updated to be adam_torch rather than our own adam_hf, as the official Torch optimizer is more robust and fixes some issues.

In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim value in your TrainingArguments.

  • 🚨🚨🚨Change default from adamw_hf to adamw_torch 🚨🚨🚨 by @muellerzr in #25109

ViVit and EfficientNet rescale bugfix

There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.

  • 🚨🚨🚨 Fix rescale ViVit Efficientnet by @amyeroberts in #25174
  • 🚨🚨🚨 Vivit update default rescale_factor value by @amyeroberts in #25547

Removing softmax for the image classification EfficientNet class

The EfficientNetForImageClassification model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.

In order to obtain previous results, pass the model logits through a softmax.

  • 🚨🚨🚨 Remove softmax for EfficientNetForImageClassification 🚨🚨🚨 by @amyeroberts in #25501

Bug fixes with SPM models

Some SPM models had issues with their management of added tokens. Namely the Llama and T5, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.

An option to obtain the previous behavior was added through the legacy flag, as explained in the PR linked above.

  • 🚨🚨🚨 [SPM] Finish fix spm models 🚨🚨🚨 by @ArthurZucker in #25224

Bugfixes and improvements

  • Disable ipex env var if false by @muellerzr in #24885
  • Check for accelerate env var when doing CPU only by @muellerzr in #24890
  • Avoid some pipeline tasks to use use_cache=True by @ydshieh in #24893
  • Update tested versions in READMEs by @EliahKagan in #24895
  • Fix test_model_parallelism for FalconModel by @ydshieh in #24914
  • Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) by @madhavajay in #24907
  • fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST by @21jun in #24902
  • Fix minor llama2.md model doc typos by @tmc in #24909
  • [Llama2] replace self.pretraining_tp with self.config.pretraining_tp by @younesbelkada in #24906
  • [doc] image_processing_vilt.py wrong default documented by @stas00 in #24931
  • Add multi-label text classification support to pytorch example by @ranchlai in #24770
  • replace no_cuda with use_cpu in test_pytorch_examples by @statelesshz in #24944
  • Generate: sequence bias can handle same terminations by @gante in #24822
  • Update processing_vision_text_dual_encoder.py by @premsa in #24950
  • Fix main_input_name in src/transformers/keras_callbacks.py by @ydshieh in #24916
  • [DOCS] Example for LogitsProcessor class by @shauray8 in #24848
  • fix type annotations for arguments in training_args by @shauray8 in #24550
  • [RWKV] Add Gradient Checkpointing support for RWKV by @younesbelkada in #24955
  • Change logic for logging in the examples by @muellerzr in #24956
  • Contrastive Search peak memory reduction by @blbadger in #24120
  • Fallback for missing attribute Parameter.ds_numel by @apoorvkh in #24942
  • fix fsdp checkpointing issues by @pacman100 in #24926
  • fix: cast input pixels to appropriate dtype for image_to_text pipelines by @JimAllanson in #24947
  • fsdp fixes and enhancements by @pacman100 in #24980
  • Fix missing spaces in system prompt of Llama2 tokenizer by @chenjoya in #24930
  • [LlamaConfig] Nit: pad token should be None by default by @ArthurZucker in #24958
  • Remove tokenizers from the doc table by @sgugger in #24963
  • Avoid importing all models when instantiating a pipeline by @sgugger in #24960
  • Fix type annotation for deepspeed training arg by @sgugger in #24988
  • Use main_input_name for include_inputs_for_metrics by @sgugger in #24993
  • Fix llama tokenization doctest by @ydshieh in #24990
  • [bnb] Add simple check for bnb import by @younesbelkada in #24995
  • [Llama] remove persistent inv_freq tensor by @ArthurZucker in #24998
  • improve from_pretrained for zero3 multi gpus mode by @1ytic in #24964
  • Move template doc file to md by @sgugger in #25004
  • [check_config_docstrings.py] improve diagnostics by @stas00 in #25012
  • [logging.py] set default stderr path if None by @ArthurZucker in #25033
  • fix(integrations): store serialized TrainingArgs to wandb.config without sanitization. by @parambharat in #25035
  • [docs] Performance docs tidy up, part 1 by @MKhalusova in #23963
  • Support GatedRepoError + use raise from by @Wauplin in #25034
  • Better handling missing SYS in llama conversation tokenizer by @ichernev in #24997
  • Add dispatch_batches to training arguments by @muellerzr in #25038
  • Fix typo in LlamaTokenizerFast docstring example by @sbrunk in #25018
  • Make more test models smaller by @sgugger in #25005
  • Pvt model by @Xrenya in #24720
  • compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. by @njbrake in #25044
  • [8bit] Fix 8bit corner case with Blip2 8bit by @younesbelkada in #25047
  • Better error message when signal is not supported on OS by @sgugger in #25049
  • [RWKV] Add note in doc on RwkvStoppingCriteria by @ArthurZucker in #25055
  • Generate - add beam indices output in contrained beam search by @gante in #25042
  • [Docs] fix rope_scaling doc string by @kashif in #25072
  • Fix last models for common tests that are too big. by @sgugger in #25058
  • fix: add TOC anchor link by @eenzeenee in #25066
  • Set TF32 flag for PyTorch cuDNN backend by @XuehaiPan in #25075
  • Fix broken link in README_hd.md by @susnato in #25067
  • replace per_gpu_eval_batch_size with per_device_eval_batch_size in readme of multiple-choice task by @statelesshz in #25078
  • [generate] Only warn users if the generation_config's max_length is set to the default value by @ArthurZucker in #25030
  • Fix: repeat per sample for SAM image embeddings by @xk-huang in #25074
  • [DOCS] add example NoBadWordsLogitsProcessor by @SoyGema in #25046
  • Allow generic composite models to pass more kwargs by @ydshieh in #24927
  • [ ForSequenceClassification] Support left padding by @ArthurZucker in #24979
  • [TF] Also apply patch to support left padding by @ArthurZucker in #25085
  • Edit err message and comment in test_model_is_small by @connor-henderson in #25087
  • [ PreTrainedTokenizerFast] Keep properties from fast tokenizer by @ArthurZucker in #25053
  • Hotfix for failing MusicgenForConditionalGeneration tests by @ydshieh in #25091
  • [T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification by @sjrl in #24726
  • Fix doctest by @ydshieh in #25031
  • fix tied_params for meta tensor by @SunMarc in #25101
  • documentation for llama2 models by @shauray8 in #25102
  • Fix PvtModelIntegrationTest::test_inference_fp16 by @ydshieh in #25106
  • Add descriptive docstring to TemperatureLogitsWarper by @nablabits in #24892
  • fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … by @liucw2012 in #24772
  • update use_auth_token -> token by @ydshieh in #25083
  • Fix past CI after #24334 by @ydshieh in #25113
  • Move common image processing methods to BaseImageProcessor by @amyeroberts in #25089
  • Fix ViT docstring regarding default dropout values. by @ebezzam in #25118
  • MaskFormer - enable return_dict in order to compile by @amyeroberts in #25052
  • Move center_crop to BaseImageProcessor by @amyeroberts in #25122
  • fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
  • fix delete all checkpoints when save_total_limit is set to 1 by @Pbihao in #25136
  • [T5/LlamaTokenizer] default legacy to None to not always warn by @ArthurZucker in #25131
  • Clarify 4/8 bit loading log message by @BramVanroy in #25134
  • [MptConfig] support from pretrained args by @ArthurZucker in #25116
  • Add offload support to Bark by @ylacombe in #25037
  • More token things by @ydshieh in #25146
  • Add bloom flax by @sanchit-gandhi in #25094
  • Add new model in doc table of content by @sgugger in #25148
  • Fix .push_to_hub and cleanup get_full_repo_name usage by @Wauplin in #25120
  • Add test when downloading from gated repo by @Wauplin in #25039
  • override .cuda() to check if model is already quantized by @ranchlai in #25166
  • Represent query_length in a different way to solve jit issue by @jiqing-feng in #25164
  • make run_generation more generic for other devices by @statelesshz in #25133
  • added compiled model support for inference by @markovalexander in #25124
  • Update use_auth_token -> token in example scripts by @ydshieh in #25167
  • [Mpt] Fix mpt slow test by @younesbelkada in #25170
  • [InstructBlip] Fix instructblip slow test by @younesbelkada in #25171
  • Fix beam search to sample at least 1 non eos token by @yonigottesman in #25103)
  • [MusicGen] Fix integration tests by @sanchit-gandhi in #25169
  • Musicgen: CFG is manually added by @gante in #25173
  • Better error message in _prepare_output_docstrings by @ydshieh in #25202
  • [PreTrainedModel] Wrap cuda and to method correctly by @younesbelkada in #25206
  • Fix all_model_classes in FlaxBloomGenerationTest by @ydshieh in #25211
  • [quantization.md] fix by @stas00 in #25190
  • [pipeline] revisit device check for pipeline by @younesbelkada in #25207
  • Update tiny model info. and pipeline testing by @ydshieh in #25213
  • Fix docker image build failure by @ydshieh in #25214
  • make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… by @sywangyi in #25193
  • [Pix2Struct] Fix pix2struct cross attention by @younesbelkada in #25200
  • [Docs/quantization] Clearer explanation on how things works under the hood. + remove outdated info by @younesbelkada in #25216
  • [MPT] Add require_bitsandbytes on MPT integration tests by @younesbelkada in #25201
  • [Detr] Fix detr BatchNorm replacement issue by @younesbelkada in #25230
  • Move rescale dtype recasting to match torchvision ToTensor by @amyeroberts in #25229
  • Fix set of model parallel in the Trainer when no GPUs are available by @sgugger in #25239
  • fix get_keys_to_not_convert() to return correct modules for full precision inference by @ranchlai in #25105
  • add pathname and line number to logging formatter in debug mode by @ranchlai in #25203
  • Add token arugment in example scripts by @ydshieh in #25172
  • resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
  • Update rescale tests - cast to float after rescaling to reflect #25229 by @amyeroberts in #25259
  • Fix some bugs for two stage training of deformable detr by @jypjypjypjyp in #25045
  • [DOCS] Add example and modified docs of EtaLogitsWarper by @ashishthomaschempolil in #25125
  • Fix return_dict_in_generate bug in InstructBlip generate function by @eohomegrownapps in #25246
  • Remove pytest_options={"rA": None} in CI by @ydshieh in #25263
  • recommend DeepSpeed's Argument Parsing documentation by @BurnzZ in #25268
  • [MMS] Fix mms by @patrickvonplaten in #25267
  • CI with num_hidden_layers=2 πŸš€πŸš€πŸš€ by @ydshieh in #25266
  • CI with pytest_num_workers=8 for torch/tf jobs by @ydshieh in #25274
  • Docs: Update list of report_to logging integrations in docstring by @tomaarsen in #25281
  • Update InstructBLIP & Align values after rescale update by @amyeroberts in #25209
  • Docs: separate generate section by @gante in #25235
  • Update bark doc by @ylacombe in #25234
  • add generate method to SpeechT5ForTextToSpeech by @ylacombe in #25233
  • Add timeout parameter to load_image function by @rolisz in #25184
  • [JAX] Bump min version by @sanchit-gandhi in #25286
  • [small] llama2.md typo by @H-Huang in #25295
  • Fix typo: Roberta -> RoBERTa by @MrGeislinger in #25302
  • Move usage of deprecated logging.warn to logging.warning by @PeterJCLaw in #25310
  • Give more memory in test_disk_offload by @sgugger in #25315
  • Generate: get generation mode as an enum by @gante in #25292
  • Add offline mode for agents by @sgugger in #25226
  • Deal with nested configs better in base class by @sgugger in #25237
  • Document check copies by @sgugger in #25291
  • Make bark could have tiny model by @ydshieh in #25290
  • Document toc check and doctest check scripts by @sgugger in #25319
  • [Whisper] Better error message for outdated generation config by @sanchit-gandhi in #25298
  • Remove jnp.DeviceArray since it is deprecated. by @mariecwhite in #24875
  • Update TF pin in docker image by @ydshieh in #25343
  • Generalize CFG to allow for positive prompts by @oobabooga in #25339
  • Loosen output shape restrictions on GPT-style models by @calpt in #25188
  • Allow trust_remote_code in example scripts by @Jackmin801 in #25248
  • Generate: remove Marian hack by @gante in #25294
  • Fix more offload edge cases by @ydshieh in #25342
  • Migrate Trainer from Repository to upload_folder by @sgugger in #25095
  • Adding more information in help parser on train_file and validation_file by @pphuc25 in #25324
  • [DOCS] Add NoRepeatNGramLogitsProcessor Example for LogitsProcessor class by @Rishab26 in #25186
  • Docs: Added benchmarks for torch.compile()Β for vision models by @merveenoyan in #24748
  • Add mask2former fp16 support by @pedrohml in #25093
  • [DOCS] Add descriptive docstring to MinNewTokensLength by @nablabits in #25196
  • Register ModelOutput subclasses as supported torch.utils._pytree nodes by @ringohoffman in #25358
  • Fix test_model_parallelism by @ydshieh in #25359
  • Add warning for missing attention mask when pad tokens are detected by @hackyon in #25345
  • [ASR Pipeline] Clarify return timestamps by @sanchit-gandhi in #25344
  • MaskFormer, Mask2Former - replace einsum for tracing by @amyeroberts in #25297
  • Load state in else by @muellerzr in #25318
  • Fix token in example template by @ydshieh in #25351
  • Enable tests to run on third-party devcies by @statelesshz in #25327
  • Fix torch_job worker(s) crashing by @ydshieh in #25374
  • Generate: add config-level validation by @gante in #25381
  • Fix missing usage of token by @ydshieh in #25382
  • Use small config for OneFormerModelTest.test_model_with_labels by @ydshieh in #25383
  • Add copied from for image processor methods by @amyeroberts in #25121
  • change version by @SunMarc in #25387
  • [DOCS] Add example for TopPLogitsWarper by @chiral-carbon in #25361
  • 16059 - Add missing type hints for ASTModel by @nablabits in #25364
  • rm useless condition since the previous condition contains it. by @jiqing-feng in #25403
  • Fix path for dynamic module creation by @sgugger in #25402
  • YOLOS - Revert default return_pixel_mask value by @amyeroberts in #25404
  • Docs: introduction to generation with LLMs by @gante in #25240
  • Generate: length validation by @gante in #25384
  • Improve training args by @statelesshz in #25401
  • Generate: generation config validation fixes in docs by @gante in #25405
  • 16059 - Add extra type hints for AltCLIPModel by @nablabits in #25399
  • Generate: lower severity of parameterization checks by @gante in #25407
  • Update Bark generation configs and tests by @ylacombe in #25409
  • aligned sample_beam output selection with beam_search by @hukuda222 in #25375
  • Enable passing number of channels when inferring data format by @amyeroberts in #25412
  • Bark: flexible generation config overload by @gante in #25414
  • [DINOv2] Update pooler output by @NielsRogge in #25392
  • Doc checks by @sgugger in #25408
  • Generation: strict generation config validation at save time by @gante in #25411
  • [WavLM] Fix Arxiv link and authors by @sanchit-gandhi in #25415
  • Generate: Load generation config when device_map is passed by @gante in #25413
  • Fix rendering for torch.compile() docs by @merveenoyan in #25432
  • Add examples to tests to run when setup.py is modified by @ydshieh in #25437
  • Fix issue with ratio evaluation steps and auto find batch size by @muellerzr in #25436
  • docs: add LLaMA-Efficient-Tuning to awesome-transformers by @statelesshz in #25441
  • Fix for #25437 by @ydshieh in #25454
  • Refactor image processor testers by @amyeroberts in #25450
  • Switch Transformers: remove overwritten beam sample test by @gante in #25458
  • Reuse the cache created for latest main on PRs/branches if setup.py is not modified by @ydshieh in #25445
  • Update run_translation.py broken link example Pytoch by @SoyGema in #25461
  • Add input_data_format argument, image transforms by @amyeroberts in #25462
  • Mark flaky tests by @amyeroberts in #25463
  • Revert "Reuse the cache created for latest main on PRs/branches" by @ydshieh in #25466
  • import required torch and numpy libraries by @eze1376 in #25483
  • fix : escape key of start_token from special characters before search end_token in token2json function of DonutProcessor by @nour-elkamel in #25472
  • Remove logging code in TF Longformer that fails to compile by @Rocketknight1 in #25496
  • Add type hints to Blip2QFormer, BigBirdForQA and ConditionalDetr family models by @nablabits in #25488
  • Set can_generate for SpeechT5ForTextToSpeech by @ylacombe in #25493
  • MaskFormer post_process_instance_segmentation bug fix convert out side of loop by @amyeroberts in #25497
  • fix gptq nits by @SunMarc in #25500
  • Conditional DETR type hint fix by @Rocketknight1 in #25505
  • Check for case where auxiliary_head is None in UperNetPreTrainedModel by @mmurray in #25514
  • add repr to the BitsAndBytesConfig class by @ranchlai in #25517
  • Make training args fully immutable by @muellerzr in #25435
  • Use dynamic past key-values shape in TF-Whisper by @Rocketknight1 in #25523
  • [TYPO] fix typo/format in quicktour.md by @lishukan in #25519
  • Fix nested configs of Jukebox by @sgugger in #25533
  • Marian: post-hack-fix correction by @gante in #25459
  • Document the test fetcher by @sgugger in #25521
  • Generate: fix default max length warning by @gante in #25539
  • fix vit hybrid test by @SunMarc in #25543
  • Fix MaskFormerModelIntegrationTest OOM by @ydshieh in #25544
  • More frozen args by @muellerzr in #25540
  • Input data format by @amyeroberts in #25464
  • [ASR Pipeline] Fix init with timestamps by @sanchit-gandhi in #25438
  • More utils doc by @sgugger in #25457
  • Update trainer.py by @yundai424 in #25553
  • Add documentation to dynamic module utils by @sgugger in #25534
  • Fix MPT CI by @ydshieh in #25548
  • Fix torch.fx tests on nightly CI by @ydshieh in #25549
  • YOLOS - reset default return_pixel_mask value by @amyeroberts in #25559
  • Skip test_onnx_runtime_optimize for now by @ydshieh in #25560
  • [Docs] Fix un-rendered images by @younesbelkada in #25561
  • Adds TRANSFORMERS_TEST_DEVICE by @vvvm23 in #25506
  • Skip test_beam_search_xla_generate_simple for T5 by @ydshieh in #25566
  • [resize_embedding] Introduce pad_to_multiple_of and guidance by @ArthurZucker in #25088
  • [SwitchTransformers] Remove unused module by @ArthurZucker in #25427
  • Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled by @sinamoeini in #25394
  • [NllbMoe] Update code to properly support loss computation by @ArthurZucker in #25429
  • [Tests] Fix failing 8bit test by @younesbelkada in #25564
  • Revert "change version by @SunMarc in #25387)"
  • add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
  • Skip test_contrastive_generate for TFXLNet by @ydshieh in #25574
  • add warning for 8bit optimizers by @SunMarc in #25575
  • Fix typo in example code by @amelietamreymond in #25583
  • Suggestions on Pipeline_webserver by @kihoon71 in #25570
  • [Docs / BetterTransformer ] Added more details about flash attention + SDPA by @younesbelkada in #25265
  • Added missing parenthesis in call to is_fsdp_enabled by @marma in #25585
  • Replaces calls to .cuda with .to(torch_device) in tests by @vvvm23 in #25571
  • [split_special_tokens] Add support for split_special_tokens argument to encode by @ArthurZucker in #25081
  • [Llama] remove prompt and fix prefix finetuning by @ArthurZucker in #25565
  • [Time series Informer] fix dtype of cumsum by @kashif in #25431
  • fix z3 init when using accelerate launcher by @pacman100 in #25589
  • [TokenizerFast] Fix setting prefix space in init by @ArthurZucker in #25563
  • Make TTS automodels importable by @osanseviero in #25595
  • reattach hooks when using resize_token_embeddings by @SunMarc in #25596
  • Ignore all exceptions from signal in dynamic code by @sgugger in #25623
  • Fix PEFT integration failures on nightly CI by @younesbelkada in #25624
  • Run doctest for new files by @ydshieh in #25588
  • Fix test_modeling_mpt typo in model id by @JuanFKurucz in #25606

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ranchlai
    • Add multi-label text classification support to pytorch example (#24770)
    • override .cuda() to check if model is already quantized (#25166)
    • fix get_keys_to_not_convert() to return correct modules for full precision inference (#25105)
    • add pathname and line number to logging formatter in debug mode (#25203)
    • add repr to the BitsAndBytesConfig class (#25517)
  • @wonhyeongseo
    • 🌐 [i18n-KO] Fixed Korean and English quicktour.md (#24664)
    • 🌐 [i18n-KO] Updated Korean serialization.md (#24686)
  • @Sunmin0520
    • 🌐 [i18n-KO] Translated testing.md to Korean (#24900)
  • @Xrenya
    • Pvt model (#24720)
  • @susnato
    • Fix broken link in README_hd.md (#25067)
    • Add Pop2Piano (#21785)
  • @sjrl
    • [T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)
  • @Jackmin801
    • Allow trust_remote_code in example scripts (#25248)
  • @mjk0618
    • 🌐 [i18n-KO] Translated add_new_model.md to Korean (#24957)

v4.31.0

9 months ago

New models

Llama v2

Llama 2 was proposed in LLaMA: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.

  • Add support for Llama 2 by @ArthurZucker in #24891

Musicgen

The MusicGen model was proposed in the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre DΓ©fossez.

MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.

  • Add Musicgen by @sanchit-gandhi in #24109

Bark

Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.

  • Add bark by @ylacombe in #24086

MMS

The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

  • Add MMS CTC Fine-Tuning by @patrickvonplaten in #24281

EnCodec

The EnCodec neural codec model was proposed in High Fidelity Neural Audio Compression by Alexandre DΓ©fossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.

  • Add EnCodec model by @hollance in #23655

InstructBLIP

The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.

  • Add InstructBLIP by @NielsRogge in #23460

Umt5

The UMT5 model was proposed in UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.

  • [Umt5] Add google's umt5 to transformers by @ArthurZucker in #24477

MRA

The MRA model was proposed in Multi Resolution Analysis (MRA) for Approximate Self-Attention by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.

  • Add Multi Resolution Analysis (MRA) by @novice03 in #24513

ViViT

The Vivit model was proposed in ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario LučiΔ‡, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.

  • Add ViViT by @jegork in #22518

Python 3.7

The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.

  • ⚠️ Time to say goodbye to py37 by @ydshieh in #24091

PyTorch 1.9

The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.

  • Byebye pytorch 1.9 by @ydshieh in #24080

RoPE scaling

This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:

  • Linear scaling
  • Dynamic NTK scaling
  • Llama/GPTNeoX: add RoPE scaling by @gante in #24653

Agents

Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.

  • Tool types by @LysandreJik in #24032

Tied weights load

Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.

  • Tied weights load by @sgugger in #24310
  • Clean load keys by @sgugger in #24505

Whisper word-level timestamps

This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.

  • add word-level timestamps to Whisper by @hollance in #23205

Auto model addition

A new auto model is added, AutoModelForTextEncoding. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.

  • [AutoModel] Add AutoModelForTextEncoding by @sanchit-gandhi in #24305

Model deprecation

Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them. (enfin Γ§a The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:

  • BORT
  • M-CTC-T
  • MMBT
  • RetriBERT
  • TAPEX
  • Trajectory Transformer
  • VAN
  • Deprecate models by @sgugger in #24787

Breaking changes

Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.

  • ⚠️⚠️[T5Tokenize] Fix T5 family tokenizers⚠️⚠️ by @ArthurZucker in #24565

Bugfixes and improvements

  • add trust_remote_code option to CLI download cmd by @radames in #24097

  • Fix typo in Llama docstrings by @Kh4L in #24020

  • Avoid GPT-2 daily CI job OOM (in TF tests) by @ydshieh in #24106

  • [Lllama] Update tokenization code to ensure parsing of the special tokens [core] by @ArthurZucker in #24042

  • PLAM => PaLM by @xingener in #24129

  • [bnb] Fix bnb config json serialization by @younesbelkada in #24137

  • Correctly build models and import call_context for older TF versions by @Rocketknight1 in #24138

  • Generate: PT's top_p enforces min_tokens_to_keep when it is 1 by @gante in #24111

  • fix bugs with trainer by @pacman100 in #24134

  • Fix TF Rag OOM issue by @ydshieh in #24122

  • Fix SAM OOM issue on CI by @ydshieh in #24125

  • Fix XGLM OOM on CI by @ydshieh in #24123

  • [SAM] Fix sam slow test by @younesbelkada in #24140

  • [lamaTokenizerFast] Update documentation by @ArthurZucker in #24132

  • [BlenderBotSmall] Update doc example by @ArthurZucker in #24092

  • Fix Pipeline CI OOM issue by @ydshieh in #24124

  • [documentation] grammatical fixes in image_classification.mdx by @LiamSwayne in #24141

  • Fix typo in streamers.py by @freddiev4 in #24144

  • [tests] fix bitsandbytes import issue by @stas00 in #24151

  • Avoid OOM in doctest CI by @ydshieh in #24139

  • Fix Wav2Vec2 CI OOM by @ydshieh in #24190

  • Fix push to hub by @NielsRogge in #24187

  • Change ProgressCallback to use dynamic_ncols=True by @gmlwns2000 in #24101

  • [i18n]Translated "attention.mdx" to korean by @kihoon71 in #23878

  • Generate: force caching on the main model, in assisted generation by @gante in #24177

  • Fix device issue in OpenLlamaModelTest::test_model_parallelism by @ydshieh in #24195

  • Update GPTNeoXLanguageGenerationTest by @ydshieh in #24193

  • typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by @zsj9509 in #24184

  • Generate: detect special architectures when loaded from PEFT by @gante in #24198

  • 🌐 [i18n-KO] Translated tasks_summary.mdx to Korean by @kihoon71 in #23977

  • 🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 by @muellerzr in #24028

  • Fix _load_pretrained_model by @SunMarc in #24200

  • Fix steps bugs in no trainer examples by @Ethan-yt in #24197

  • Skip RWKV test in past CI by @ydshieh in #24204

  • Remove unnecessary aten::to overhead in llama by @fxmarty in #24203

  • Update WhisperForAudioClassification doc example by @ydshieh in #24188

  • Finish dataloader integration by @muellerzr in #24201

  • Add the number of model test failures to slack CI report by @ydshieh in #24207

  • fix: TextIteratorStreamer cannot work with pipeline by @yuanwu2017 in #23641

  • Update (TF)SamModelIntegrationTest by @ydshieh in #24199

  • Improving error message when using use_safetensors=True. by @Narsil in #24232

  • Safely import pytest in testing_utils.py by @amyeroberts in #24241

  • fix overflow when training mDeberta in fp16 by @sjrl in #24116

  • deprecate use_mps_device by @pacman100 in #24239

  • Tied params cleanup by @sgugger in #24211

  • [Time Series] use mean scaler when scaling is a boolean True by @kashif in #24237

  • TF: standardize test_model_common_attributes for language models by @gante in #23457

  • Generate: GenerationConfig can overwrite attributes at from_pretrained time by @gante in #24238

  • Add torch >=1.12 requirement for Tapas by @ydshieh in #24251

  • Update urls in warnings for rich rendering by @IvanReznikov in #24136

  • Fix how we detect the TF package by @Rocketknight1 in #24255

  • Stop storing references to bound methods via tf.function by @Rocketknight1 in #24146

  • Skip GPT-J fx tests for torch < 1.12 by @ydshieh in #24256

  • docs wrt using accelerate launcher with trainer by @pacman100 in #24250

  • update FSDP save and load logic by @pacman100 in #24249

  • Fix URL in comment for contrastive loss function by @taepd in #24271

  • QA doc: import torch before it is used by @ByronHsu in #24228

  • Skip some TQAPipelineTests tests in past CI by @ydshieh in #24267

  • TF: CTRL with native embedding layers by @gante in #23456

  • Adapt Wav2Vec2 conversion for MMS lang identification by @patrickvonplaten in #24234

  • Update check of core deps by @sgugger in #24277

  • Pix2StructImageProcessor requires torch>=1.11.0 by @ydshieh in #24270

  • Fix Debertav2 embed_proj by @WissamAntoun in #24205

  • Clean up old Accelerate checks by @sgugger in #24279

  • Fix bug in slow tokenizer conversion, make it a lot faster by @stephantul in #24266

  • Fix check_config_attributes: check all configuration classes by @ydshieh in #24231

  • Fix LLaMa beam search when using parallelize by @FeiWang96 in #24224

  • remove unused is_decoder parameter in DetrAttention by @JayL0321 in #24226

  • Split common test from core tests by @sgugger in #24284

  • [fix] bug in BatchEncoding.getitem by @flybird1111 in #24293

  • Fix image segmentation tool bug by @amyeroberts in #23897

  • [Docs] Improve docs for MMS loading of other languages by @patrickvonplaten in #24292

  • Update README_zh-hans.md by @CooperFu in #24181

  • deepspeed init during eval fix by @pacman100 in #24298

  • [EnCodec] Changes for 32kHz ckpt by @sanchit-gandhi in #24296

  • [Docs] Fix the paper URL for MMS model by @hitchhicker in #24302

  • Update tokenizer_summary.mdx (grammar) by @belladoreai in #24286

  • Beam search type by @jprivera44 in #24288

  • Make can_generate as class method by @ydshieh in #24299

  • Update test versions on README.md by @sqali in #24307

  • [SwitchTransformers] Fix return values by @ArthurZucker in #24300

  • Fix functional TF Whisper and modernize tests by @Rocketknight1 in #24301

  • Big TF test cleanup by @Rocketknight1 in #24282

  • Fix ner average grouping with no groups by @Narsil in #24319

  • Fix ImageGPT doc example by @amyeroberts in #24317

  • Add test for proper TF input signatures by @Rocketknight1 in #24320

  • Adding ddp_broadcast_buffers argument to Trainer by @TevenLeScao in #24326

  • error bug on saving distributed optim state when using data parallel by @xshaun in #24108

  • 🌐 [i18n-KO] Fixed tutorial/preprocessing.mdx by @sim-so in #24156

  • pin apex to a speicifc commit (for DeepSpeed CI docker image) by @ydshieh in #24351

  • byebye Hub connection timeout by @ydshieh in #24350

  • Clean up disk sapce during docker image build for transformers-pytorch-gpu by @ydshieh in #24346

  • Fix KerasMetricCallback: pass generate_kwargs even if use_xla_generation is False by @Kripner in #24333

  • Fix device issue in SwitchTransformers by @ydshieh in #24352

  • Update MMS integration docs by @vineelpratap in #24311

  • Make AutoFormer work with previous torch version by @ydshieh in #24357

  • Fix ImageGPT doctest by @amyeroberts in #24353

  • Fix link to documentation in Install from Source by @SoyGema in #24336

  • docs: add BentoML to awesome-transformers by @aarnphm in #24344

  • [Doc Fix] Fix model name path in the transformers doc for AutoClasses by @riteshghorse in #24329

  • Fix the order in GPTNeo's docstring by @qgallouedec in #24358

  • Respect explicitly set framework parameter in pipeline by @denis-ismailaj in #24322

  • Allow passing kwargs through to TFBertTokenizer by @Rocketknight1 in #24324

  • Fix resuming PeftModel checkpoints in Trainer by @llohann-speranca in #24274

  • TensorFlow CI fixes by @Rocketknight1 in #24360

  • Update tiny models for pipeline testing. by @ydshieh in #24364

  • [modelcard] add audio classification to task list by @sanchit-gandhi in #24363

  • [Whisper] Make tests faster by @sanchit-gandhi in #24105

  • Rename test to be more accurate by @sgugger in #24374

  • Add a check in ImageToTextPipeline._forward by @ydshieh in #24373

  • [Tokenizer doc] Clarification about add_prefix_space by @ArthurZucker in #24368

  • style: add BitsAndBytesConfig repr function by @aarnphm in #24331

  • Better test name and enable pipeline test for pix2struct by @ydshieh in #24377

  • Skip a tapas (tokenization) test in past CI by @ydshieh in #24378

  • [Whisper Docs] Nits by @ArthurZucker in #24367

  • [GPTNeoX] Nit in config by @ArthurZucker in #24349

  • [Wav2Vec2 - MMS] Correct directly loading adapters weights by @patrickvonplaten in #24335

  • Migrate doc files to Markdown. by @sgugger in #24376

  • Update deprecated torch.ger by @kit1980 in #24387

  • [docs] Fix NLLB-MoE links by @stevhliu in #24388

  • Add ffmpeg for doc_test_job on CircleCI by @ydshieh in #24397

  • byebye Hub connection timeout - Recast by @ydshieh in #24399

  • fix type annotation for debug arg by @Bearnardd in #24033

  • [Trainer] Fix optimizer step on PyTorch TPU by @cowanmeg in #24389

  • Fix gradient checkpointing + fp16 autocast for most models by @younesbelkada in #24247

  • Clean up dist import by @muellerzr in #24402

  • Check auto mappings could be imported via from transformers by @ydshieh in #24400

  • Remove redundant code from TrainingArgs by @muellerzr in #24401

  • Explicit arguments in from_pretrained by @ydshieh in #24306

  • [ASR pipeline] Check for torchaudio by @sanchit-gandhi in #23953

  • TF safetensors reduced mem usage by @Rocketknight1 in #24404

  • Skip test_conditional_generation_pt_pix2struct in Past CI (torch < 1.11) by @ydshieh in #24417

  • [bnb]Β Fix bnb serialization issue with new release by @younesbelkada in #24416

  • Revert "Fix gradient checkpointing + fp16 autocast for most models" by @younesbelkada in #24420

  • Fix save_cache version in config.yml by @ydshieh in #24419

  • Update RayTune doc link for Hyperparameter tuning by @JoshuaEPSamuel in #24422

  • TF CI fix for Segformer by @Rocketknight1 in #24426

  • Refactor hyperparameter search backends by @alexmojaki in #24384

  • Clarify batch size displayed when using DataParallel by @sgugger in #24430

  • Save site-packages as cache in CircleCI job by @ydshieh in #24424

  • [llama] Fix comments in weights converter by @weimingzha0 in #24436

  • [Trainer] Fix .to call on 4bit models by @younesbelkada in #24444

  • fix the grad_acc issue at epoch boundaries by @pacman100 in #24415

  • Replace python random with torch.rand to enable dynamo.export by @BowenBao in #24434

  • Fix typo by @siryuon in #24440

  • Fix some TFWhisperModelIntegrationTests by @ydshieh in #24428

  • fixes issue when saving fsdp via accelerate's FSDP plugin by @pacman100 in #24446

  • Allow dict input for audio classification pipeline by @sanchit-gandhi in #23445

  • Update JukeboxConfig.from_pretrained by @ydshieh in #24443

  • Improved keras imports by @Rocketknight1 in #24448

  • add missing alignment_heads to Whisper integration test by @hollance in #24487

  • Fix tpu_metrics_debug by @cowanmeg in #24452

  • Update AlbertModel type annotation by @amyeroberts in #24450

  • [pipeline] Fix str device issue by @younesbelkada in #24396

  • when resume from peft checkpoint, the model should be trainable by @sywangyi in #24463

  • deepspeed z1/z2 state dict fix by @pacman100 in #24489

  • Update InstructBlipModelIntegrationTest by @ydshieh in #24490

  • Update token_classification.md by @condor-cp in #24484

  • Add support for for loops in python interpreter by @sgugger in #24429

  • [InstructBlip] Add accelerate support for instructblip by @younesbelkada in #24488

  • Compute dropout_probability only in training mode by @ydshieh in #24486

  • Fix 'local_rank' AttiributeError in Trainer class by @mocobeta in #24297

  • Compute dropout_probability only in training mode (SpeechT5) by @ydshieh in #24498

  • Fix link in utils by @SoyGema in #24501

  • 🚨🚨 Fix group beam search by @hukuda222 in #24407

  • Generate: group_beam_search requires diversity_penalty>0.0 by @gante in #24456

  • Generate: min_tokens_to_keep has to be >= 1 by @gante in #24453

  • Fix TypeError: Object of type int64 is not JSON serializable by @xiaoli in #24340

  • Fix poor past ci by @ydshieh in #24485

  • 🌐 [i18n-KO] Translated tflite.mdx to Korean by @0525hhgus in #24435

  • use accelerate autocast in jit eval path, since mix precision logic is… by @sywangyi in #24460

  • Update hyperparameter_search.py by @pacman100 in #24515

  • [T5] Add T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24481

  • set model to training mode before accelerate.prepare by @sywangyi in #24520

  • Update huggingface_hub commit sha by @ydshieh in #24527

  • Find module name in an OS-agnostic fashion by @sgugger in #24526

  • Fix LR scheduler based on bs from auto bs finder by @muellerzr in #24521

  • [Mask2Former] Remove SwinConfig by @NielsRogge in #24259

  • Allow backbones not in backbones_supported - Maskformer Mask2Former by @amyeroberts in #24532

  • Fix Typo by @tony9402 in #24530

  • Finishing tidying keys to ignore on load by @sgugger in #24535

  • Add bitsandbytes support for gpt2 models by @DarioSucic in #24504

  • ⚠️ Time to say goodbye to py37 by @ydshieh in #24091

  • Unpin DeepSpeed and require DS >= 0.9.3 by @ydshieh in #24541

  • Allow for warn_only selection in enable_full_determinism by @Frank995 in #24496

  • Fix typing annotations for FSDP and DeepSpeed in TrainingArguments by @mryab in #24549

  • Update PT/TF weight conversion after #24030 by @ydshieh in #24547

  • Update EncodecIntegrationTest by @ydshieh in #24553

  • [gpt2-int8] Add gpt2-xl int8 test by @younesbelkada in #24543

  • Fix processor init bug if image processor undefined by @amyeroberts in #24554

  • [InstructBlip] Add instruct blip int8 test by @younesbelkada in #24555

  • Update PT/Flax weight conversion after #24030 by @ydshieh in #24556

  • Make PT/Flax tests could be run on GPU by @ydshieh in #24557

  • Update masked_language_modeling.md by @condor-cp in #24560

  • Fixed OwlViTModel inplace operations by @pasqualedem in #24529

  • Update old existing feature extractor references by @amyeroberts in #24552

  • Fix Typo by @tony9402 in #24559

  • Fix annotations by @tony9402 in #24571

  • Docs: 4 bit doc corrections by @gante in #24572

  • Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" by @sgugger in #24574

  • Update some torchscript tests after #24505 by @ydshieh in #24566

  • Removal of deprecated vision methods and specify deprecation versions by @amyeroberts in #24570

  • Fix ESM models buffers by @sgugger in #24576

  • Check all objects are equally in the main __init__ file by @ydshieh in #24573

  • Fix annotations by @tony9402 in #24582

  • fix peft ckpts not being pushed to hub by @pacman100 in #24578

  • Udate link to RunHouse hardware setup documentation. by @BioGeek in #24590

  • Show a warning for missing attention masks when pad_token_id is not None by @hackyon in #24510

  • Make (TF) CI faster (test only a subset of model classes) by @ydshieh in #24592

  • Speed up TF tests by reducing hidden layer counts by @Rocketknight1 in #24595

  • [several models] improve readability by @stas00 in #24585

  • Use protobuf 4 by @ydshieh in #24599

  • Limit Pydantic to V1 in dependencies by @lig in #24596

  • 🌐 [i18n-KO] Translated perplexity.mdx to Korean by @HanNayeoniee in #23850

  • [Time-Series] Added blog-post to tips by @elisim in #24482

  • Pin Pillow for now by @ydshieh in #24633

  • Fix loading dataset docs link in run_translation.py example by @SoyGema in #24594

  • Generate: multi-device support for contrastive search by @gante in #24635

  • Generate: force cache with inputs_embeds forwarding by @gante in #24639

  • precompiled_charsmap checking before adding to the normalizers' list for XLNetTokenizerFast conversion. by @shahad-mahmud in #24618

  • Fix audio feature extractor deps by @sanchit-gandhi in #24636

  • llama fp16 torch.max bug fix by @prathikr in #24561

  • documentation_tests.txt - sort filenames alphabetically by @amyeroberts in #24647

  • Update warning messages reffering to post_process_object_detection by @rafaelpadilla in #24649

  • Add finetuned_from property in the autogenerated model card by @sgugger in #24528

  • Make warning disappear for remote code in pipelines by @sgugger in #24603

  • Fix EncodecModelTest::test_multi_gpu_data_parallel_forward by @ydshieh in #24663

  • Fix VisionTextDualEncoderIntegrationTest by @ydshieh in #24661

  • Add is_torch_mps_available function to utils by @NripeshN in #24660

  • Unpin huggingface_hub by @ydshieh in #24667

  • Fix model referenced and results in documentation. Model mentioned was inaccessible by @rafaelpadilla in #24609

  • Add Nucleotide Transformer notebooks and restructure notebook list by @Rocketknight1 in #24669

  • LlamaTokenizer should be picklable by @icyblade in #24681

  • Add dropouts to GPT-NeoX by @ZHAOTING in #24680

  • DeepSpeed/FSDP ckpt saving utils fixes and FSDP training args fixes by @pacman100 in #24591

  • Avoid import sentencepiece_model_pb2 in utils.__init__.py by @ydshieh in #24689

  • Fix integration with Accelerate and failing test by @muellerzr in #24691

  • [MT5] Fix CONFIG_MAPPING issue leading it to load umt5 class by @ArthurZucker in #24678

  • Fix flaky test_for_warning_if_padding_and_no_attention_mask by @ydshieh in #24706

  • Whisper: fix prompted max length by @gante in #24666

  • Enable conversational pipeline for GPTSw3Tokenizer by @saattrupdan in #24648

  • [T5] Adding model_parallel = False to T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24684

  • Docs: change some input_ids doc reference from BertTokenizer to AutoTokenizer by @gante in #24730

  • add link to accelerate doc by @SunMarc in #24601

  • [Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words by @ArthurZucker in #24622

  • Fix typo in LocalAgent by @jamartin9 in #24736

  • fix: Text splitting in the BasicTokenizer by @connor-henderson in #22280

  • Docs: add kwargs type to fix formatting by @gante in #24733

  • add gradient checkpointing for distilbert by @jordane95 in #24719

  • Skip keys not in the state dict when finding mismatched weights by @sgugger in #24749

  • Fix non-deterministic Megatron-LM checkpoint name by @janEbert in #24674

  • [InstructBLIP] Fix bos token of LLaMa checkpoints by @NielsRogge in #24492

  • Skip some slow tests for doctesting in PRs (Circle)CI by @ydshieh in #24753

  • Fix lr scheduler not being reset on reruns by @muellerzr in #24758

  • :bug: Handle empty gen_kwargs for seq2seq trainer prediction_step function by @gkumbhat in #24759

  • Allow existing configs to be registered by @sgugger in #24760

  • Unpin protobuf in docker file (for daily CI) by @ydshieh in #24761

  • Fix eval_accumulation_steps leading to incorrect metrics by @muellerzr in #24756

  • Add MobileVitV2 to doctests by @amyeroberts in #24771

  • Docs: Update logit processors call docs by @gante in #24729

  • Replacement of 20 asserts with exceptions by @Baukebrenninkmeijer in #24757

  • Update default values of bos/eos token ids in CLIPTextConfig by @ydshieh in #24773

  • Fix pad across processes dim in trainer and not being able to set the timeout by @muellerzr in #24775

  • gpt-bigcode: avoid zero_ to support Core ML by @pcuenca in #24755

  • Remove WWT from README by @LysandreJik in #24672

  • Rm duplicate pad_across_processes by @muellerzr in #24780

  • Revert "Unpin protobuf in docker file (for daily CI)" by @ydshieh in #24800

  • Removing unnecessary device=device in modeling_llama.py by @Liyang90 in #24696

  • [fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" by @SeongBeomLEE in #24769

  • [DOC] Clarify relationshi load_best_model_at_end and save_total_limit by @BramVanroy in #24614

  • Upgrade jax/jaxlib/flax pin versions by @ydshieh in #24791

  • Fix MobileVitV2 doctest checkpoint by @amyeroberts in #24805

  • Skip torchscript tests for MusicgenForConditionalGeneration by @ydshieh in #24782

  • Generate: add SequenceBiasLogitsProcessor by @gante in #24334

  • Add accelerate version in transformers-cli env by @amyeroberts in #24806

  • Fix typo 'submosules' by @dymil in #24809

  • Remove Falcon docs for the release until TGI is ready by @Rocketknight1 in #24808

  • Update setup.py to be compatible with pipenv by @georgiemathews in #24789

  • Use _BaseAutoModelClass's register method by @fadynakhla in #24810

  • Run hub tests by @sgugger in #24807

  • Copy code when using local trust remote code by @sgugger in #24785

  • Fixing double use_auth_token.pop (preventing private models from being visible). by @Narsil in #24812

  • set correct model input names for gptsw3tokenizer by @DarioSucic in #24788

  • Check models used for common tests are small by @sgugger in #24824

  • [πŸ”— Docs] Fixed Incorrect Migration Link by @kadirnar in #24793

  • deprecate sharded_ddp training argument by @statelesshz in #24825

  • 🌐 [i18n-KO] TranslatedΒ custom_tools.mdx to Korean by @sim-so in #24580

  • Remove unused code in GPT-Neo by @namespace-Pt in #24826

  • Add Multimodal heading and Document question answering in task_summary.mdx by @y3sar in #23318

  • Fix is_vision_available by @ydshieh in #24853

  • Fix comments for _merge_heads by @bofenghuang in #24855

  • fix broken links in READMEs by @younesbelkada in #24861

  • Add TAPEX to the list of deprecated models by @sgugger in #24859

  • Fix token pass by @sgugger in #24862

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @hollance
    • [WIP] add EnCodec model (#23655)
    • add word-level timestamps to Whisper (#23205)
    • add missing alignment_heads to Whisper integration test (#24487)
  • @sim-so
    • 🌐 [i18n-KO] Fixed tutorial/preprocessing.mdx (#24156)
    • 🌐 [i18n-KO] TranslatedΒ custom_tools.mdx to Korean (#24580)
  • @novice03
    • Add Multi Resolution Analysis (MRA) (New PR) (#24513)
  • @jegork
    • Add ViViT (#22518)

v4.30.2

11 months ago
  • Fix push to hubby @NielsRogge in #24187
  • Fix how we detect the TF package by @Rocketknight1 in #24255

v4.30.1

11 months ago
  • Fix bnb config json serialization in #24137 by @younesbelkada
  • Correctly build models and import call_context for older TF versions in #24138 by @Rocketknight1
  • Fix bugs with trainer in #24134 by @pacman100

v4.30.0

11 months ago

100k

Transformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of transformers and we have decided to create an awesome-transformers page to do just that.

We accept PRs to add projects to the list!

  • Top 100 by @LysandreJik in #22912
  • Add LlamaIndex to awesome-transformers.md by @ravi03071991 in #23484
  • add cleanlab to awesome-transformers tools list by @jwmueller in #23440

4-bit quantization and QLoRA

By leveraging the bitsandbytes library by @TimDettmers, we add 4-bit support to transformers models!

  • 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #23479

Agents

The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:

  • Local agent capabilities, to load a generative model directly from transformers instead of relying on APIs.
  • Prompts are now hosted on the Hub, which means that anyone can fork the prompts and update them with theirs, to let other community contributors re-use them
  • We add an AzureOpenAiAgent class to support Azure OpenAI agents.
  • Add local agent by @sgugger in #23438
  • Enable prompts on the Hub by @sgugger in #23662
  • Add AzureOpenAiAgent by @sgugger in #24058

Safetensors

The safetensors library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).

It has now become a core dependency of transformers.

  • Making safetensors a core dependency. by @Narsil in #23254

New models

Swiftformer

The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called β€˜SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2Γ— faster compared to MobileViT-v2.

  • Add swiftformer by @shehanmunasinghe in #22686

Autoformer

This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.

  • [Time-Series] Autoformer model by @elisim in #21891

MobileViTv2

MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.

  • Add MobileViTv2 by @shehanmunasinghe in #22820

PerSAM

PerSAM proposes a minimal modification to SAM to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.

  • Add PerSAM [bis] by @NielsRogge in #23659

Timm backbone

We add support for loading timm weights within the AutoBackbone API in transformers. timm models can be instantiated through the TimmBackbone class, and then used with any vision model that needs a backbone.

  • Add TimmBackbone model by @amyeroberts in #22619

Image to text pipeline conditional support

We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.

  • [image-to-text pipeline] Add conditional text support + GIT by @NielsRogge in #23362

TensorFlow implementations

  • Add TensorFlow implementation of EfficientFormer by @D-Roberts in #22620

Accelerate Migration

A major rework of the internals of the Trainer is underway, leveraging accelerate instead of redefining them in transformers. This should unify both framework and lead to increased interoperability and more efficient development.

  • Smangrul/accelerate mp integrate by @pacman100 in #23148
  • Smangrul/accelerate ddp integrate by @pacman100 in #23151
  • fix trainer slow tests related to hyperparam search by @pacman100 in #24011
  • remove the extra accelerator.prepare by @pacman100 in #23914
  • move fsdp handling to accelerate by @pacman100 in #23158
  • shift torch dynamo handling to accelerate by @pacman100 in #23168
  • accelerate deepspeed and gradient accumulation integrate by @pacman100 in #23236
  • fix executable batch size issue by @pacman100 in #24067
  • fix accelerator prepare during eval only mode by @pacman100 in #24014
  • reset accelerate env variables after each test by @pacman100 in #24107
  • Fix translation no_trainer by @muellerzr in #23407
  • Update error message when Accelerate isn't installed by @muellerzr in #23373
  • Fix parallel mode check by @muellerzr in #23409
  • Muellerzr fix deepspeed by @muellerzr in #23657
  • Update all no_trainer with skip_first_batches by @muellerzr in #23664
  • Fix sagemaker DP/MP by @muellerzr in #23681
  • Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. by @muellerzr in #23800
  • Up pinned accelerate version by @muellerzr in #24089
  • Move import check to before state reset by @muellerzr in #23906
  • Upgrade safetensors version by @muellerzr in #23911
  • Act on deprecations in Accelerate no_trainer examples by @muellerzr in #24053
  • Oops, missed one by @muellerzr in #24054

Bugfixes and improvements

  • chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759

  • Fix link displayed for custom tools by @sgugger in #23274

  • Remove missplaced test file by @sgugger in #23275

  • Bring back the PR Refactor doctests + add CI to main by @ydshieh in #23271

  • [gpt] Gpt2 fix half precision causal mask by @younesbelkada in #23256

  • Temporary tolerance fix for flaky whipser PT-TF equiv. test by @amyeroberts in #23257

  • Add top_k argument to post-process of conditional/deformable-DETR by @CreatlV in #22787

  • transformers-cli -> huggingface-cli by @AlpinDale in #23276

  • Temporarily increase tol for PT-FLAX whisper tests by @amyeroberts in #23288

  • Added missing " in CHAT_PROMPT_TEMPLATE by @galatolofederico in #23287

  • Update custom_tools.mdx: fix link by @mishig25 in #23292

  • Update transformers_agents.mdx by @mishig25 in #23289

  • Convert numpy arrays to lists before saving the evaluation metrics as json by @harisankar95 in #23268

  • Fix doctest files fetch issue by @ydshieh in #23277

  • skip test_run_squad_no_trainer for now by @ydshieh in #23302

  • Better check for packages availability by @apbard in #23163

  • Add gradient_checkpointing parameter to FlaxWhisperEncoder by @raghavanone in #23300

  • Agents extras by @LysandreJik in #23301

  • Fix broken links in the agent docs by @sgugger in #23297

  • Fix typo in gradio-tools docs by @freddyaboulton in #23305

  • Fix image segmentation tool test by @sgugger in #23306

  • unpin tf prob by @ydshieh in #23293

  • Revert "search buffers for dtype" by @sgugger in #23308

  • Remove LanguageIdentificationTool in __init__.py as we don't have it yet by @ydshieh in #23326

  • Fix docker image (caused by tensorflow_text) by @ydshieh in #23321

  • Compute the mask in-place, with less memory reads, and on CUDA on XLNetLMHeadModel by @lezcano in #23332

  • Only add files with modification outside doc blocks by @ydshieh in #23327

  • [docs] Fix Agents and Tools docstring by @stevhliu in #23313

  • OR am I crazy? by @hwuebben in #23295

  • Handle padding warning in generation when using inputs_embeds by @zrthxn in #23131

  • replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by @susnato in #23273

  • Use cu118 with cudnn >= 8.6 in docker file by @ydshieh in #23339

  • Removing one of the twice defined position_embeddings in LongFormer by @GregorySenay in #23343

  • Fix issue introduced in PR #23163 by @ydshieh in #23363

  • Typo suggestion by @richardachen in #23360

  • Fix some is_xxx_available by @ydshieh in #23365

  • Fix BigBirdForMaskedLM doctest by @ydshieh in #23369

  • Fix OwlViTForObjectDetection.image_guided_detection doc example by @ydshieh in #23370

  • Revert "Only add files with modification outside doc blocks" by @ydshieh in #23371

  • [Bugfix] OPTDecoderLayer does not return attentions when gradient_checkpointing and training is enabled. by @gmlwns2000 in #23367

  • Skip failing AlignModelTest::test_multi_gpu_data_parallel_forward by @ydshieh in #23374

  • Fix test typos - audio feature extractors by @LWprogramming in #23310

  • Added type hints for Graphormer pytorch version by @dewasahu2003 in #23073

  • Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by @gojiteji in #23356

  • Use mkstemp to replace deprecated mktemp by @ready-research in #23372

  • Fix RwkvModel by @ydshieh in #23392

  • Update test_batched_inference_image_captioning_conditioned by @ydshieh in #23391

  • OPT/BioGPT: Improved attention mask shape exception by @gante in #23270

  • Fix chat prompt in HFAgent by @IvanSedykh in #23335

  • 🌐 [i18n-KO] Translated asr.mdx to Korean by @sim-so in #23106

  • Minor fixes in transformers-tools by @Wauplin in #23364

  • [Pix2Struct] Add conditional generation on docstring example by @younesbelkada in #23399

  • Generate: faster can_generate check on TF and Flax by @gante in #23398

  • [AutoModel] fix torch_dtype=auto in from_pretrained by @stas00 in #23379

  • Docs: add link to assisted generation blog post by @gante in #23397

  • Build with non Python files by @sgugger in #23405

  • Generate: add test to check KV format by @gante in #23403

  • Replace appends with list comprehension. by @ttsugriy in #23359

  • Fix smdistributed check by @sgugger in #23414

  • Why crash the whole run when HFHub gives a 50x error? by @ropoctl in #23320

  • Run doctest (in PRs) only when some doc example(s) are modified by @ydshieh in #23387

  • Update ConvNextV2ModelIntegrationTest::test_inference_image_classification_head by @ydshieh in #23402

  • Fix a typo in HfAgent docstring. by @ttsugriy in #23420

  • Use dict.items to avoid unnecessary lookups. by @ttsugriy in #23415

  • Update 3 docker files to use cu118 by @ydshieh in #23406

  • [SAM] fix sam slow test by @younesbelkada in #23376

  • Return early once stop token is found. by @ttsugriy in #23421

  • [Reland] search model buffers for dtype as the last resort by @cyyever in #23319

  • Add Missing tokenization test [electra] by @IMvision12 in #22997

  • Small fixes and link in the README by @LysandreJik in #23428

  • TF: embeddings out of bounds check factored into function by @gante in #23427

  • Update Bigbird Pegasus tests by @ydshieh in #23431

  • Encoder-Decoder: add informative exception when the decoder is not compatible by @gante in #23426

  • Remove hardcoded prints in Trainer by @hugoabonizio in #23432

  • Fix device issue in SwiftFormerModelIntegrationTest::test_inference_image_classification_head by @ydshieh in #23435

  • Generate: skip left-padding tests on old models by @gante in #23437

  • remove unnecessary print in gpt neox sequence classifier by @cfhammill in #23433

  • 🌐 [i18n-KO] Translated tasks/zero_shot_object_detection.mdx to Korean by @HanNayeoniee in #23430

  • Fix (skip) a pipeline test for RwkvModel by @ydshieh in #23444

  • Fix DecisionTransformerConfig doctring by @joaoareis in #23450

  • TF: GPT2 with native embedding layers by @gante in #23436

  • Make RwkvModel accept attention_mask but discard it internally by @ydshieh in #23442

  • Less flaky test_assisted_decoding_matches_greedy_search by @ydshieh in #23451

  • Update tiny models and pipeline tests by @ydshieh in #23446

  • Properly guard PyTorch stuff by @sgugger in #23452

  • Add an option to log result from the Agent by @sgugger in #23454

  • Clean up CUDA kernels by @sgugger in #23455

  • fix bug in group_texts function, that was inserting short batches by @BodaSadalla98 in #23429

  • feat: Whisper prompting by @connor-henderson in #22496

  • README: Fix affiliation for MEGA by @julien-c in #23394

  • Remove .data usages in optimizations.py by @alanwaketan in #23417

  • TF port of the Segment Anything Model (SAM) by @Rocketknight1 in #22970

  • [RWKV] Rwkv fix for 8bit inference by @younesbelkada in #23468

  • Use config to set name and description if not present by @sgugger in #23473

  • Fix transformers' DeepSpeed CI job by @ydshieh in #23463

  • Fix PretrainedConfig min_length docstring by @joaoareis in #23471

  • Fix: Change tensors to integers for torch.dynamo and torch.compile compatibility by @loevlie in #23475

  • [Blip] Remove redundant shift right by @younesbelkada in #23153

  • Fix DeepSpeed stuff in the nightly CI by @ydshieh in #23478

  • Fix confusing transformers installation in CI by @ydshieh in #23465

  • Fix tests/repo_utils/test_get_test_info.py by @ydshieh in #23485

  • Debug example code for MegaForCausalLM by @Tylersuard in #23382

  • Remove erroneous img closing tag by @xenova in #23646

  • Fix tensor device while attention_mask is not None by @zspo in #23538

  • Fix accelerate logger bug by @younesbelkada in #23650

  • Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory by @TimDettmers in #23535

  • Fix wav2vec2 is_batched check to include 2-D numpy arrays by @LWprogramming in #23223

  • changing the requirements to a cpu torch version that works by @sshahrokhi in #23483

  • Fix SAM tests and use smaller checkpoints by @Rocketknight1 in #23656

  • Update workflow files by @ydshieh in #23658

  • small fix to remove unused eos in processor when it's not used. by @Narsil in #23408

  • Fix typo in a parameter name for open llama model by @aaalexlit in #23637

  • Fix PyTorch SAM tests by @ydshieh in #23682

  • 🌐 [i18n-KO] Translated tasks/monocular_depth_estimation.mdx to Korean by @HanNayeoniee in #23621

  • Fix a BridgeTower test by @ydshieh in #23694

  • [SAM]Β Fixes pipeline and adds a dummy pipeline test by @younesbelkada in #23684

  • TF version compatibility fixes by @Rocketknight1 in #23663

  • [Blip] Fix blip doctest by @younesbelkada in #23698

  • is_batched fix for remaining 2-D numpy arrays by @LWprogramming in #23309

  • Skip TFCvtModelTest::test_keras_fit_mixed_precision for now by @ydshieh in #23699

  • fix: load_best_model_at_end error when load_in_8bit is True by @dkqkxx in #23443

  • Fix some docs what layerdrop does by @zspo in #23691

  • add GPTJ/bloom/llama/opt into model list and enhance the jit support by @sywangyi in #23291

  • Paged Optimizer + Lion Optimizer for Trainer by @TimDettmers in #23217

  • Export to ONNX doc refocused on using optimum, added tflite by @MKhalusova in #23434

  • fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT by @uchuhimo in #23683

  • fix gptj could not jit.trace in GPU by @sywangyi in #23317

  • Better TF docstring types by @Rocketknight1 in #23477

  • Minor awesome-transformers.md fixes by @pagarsky in #23453

  • TF SAM memory reduction by @Rocketknight1 in #23732

  • fix: delete duplicate sentences in document_question_answering.mdx by @jungnerd in #23735

  • fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation by @connor-henderson in #23724

  • Overhaul TF serving signatures + dummy inputs by @Rocketknight1 in #23234

  • [Whisper] Reduce batch size in tests by @sanchit-gandhi in #23736

  • Fix the regex in get_imports to support multiline try blocks and excepts with specific exception types by @dakinggg in #23725

  • Remove the last few TF serving sigs by @Rocketknight1 in #23738

  • Fix pip install --upgrade accelerate command in modeling_utils.py by @tloen in #23747

  • Fix psuh_to_hub in Trainer when nothing needs pushing by @sgugger in #23751

  • Revamp test selection for the example tests by @sgugger in #23737

  • [LongFormer] code nits, removed unused parameters by @ArthurZucker in #23749

  • Fix is_ninja_available() by @niltok in #23752

  • [Nllb-Moe] Fix nllb moe accelerate issue by @younesbelkada in #23758

  • [OPT] Doc nit, using fast is fine by @ArthurZucker in #23789

  • Fix RWKV backward on GPU by @sgugger in #23774

  • Update trainer.mdx class_weights example by @amitportnoy in #23787

  • no_cuda does not take effect in non distributed environment by @sywangyi in #23795

  • Fix no such file or directory error by @RissyRan in #23783

  • Enable code-specific revision for code on the Hub by @sgugger in #23799

  • add type hint in pipeline model argument by @y3sar in #23740

  • TF SAM shape flexibility fixes by @Rocketknight1 in #23842

  • fix Whisper tests on GPU by @hollance in #23753

  • 🌐 [i18n-KO] Translated fast_tokenizers.mdx to Korean by @KIHOON71 in #22956

  • [i18n-KO] Translated video_classification.mdx to Korean by @KIHOON71 in #23026

  • 🌐 [i18n-KO] Translated troubleshooting.mdx to Korean by @0525hhgus in #23166

  • Adds a FlyteCallback by @peridotml in #23759

  • Update collating_graphormer.py by @clefourrier in #23862

  • [LlamaTokenizerFast] nit update post_processor on the fly by @ArthurZucker in #23855

  • #23388 Issue: Update RoBERTa configuration by @vijethmoudgalya in #23863

  • [from_pretrained] imporve the error message when _no_split_modules is not defined by @ArthurZucker in #23861

  • Editing issue with pickle def with lambda function by @Natyren in #23869

  • Adds AutoProcessor.from_pretrained support for MCTCTProcessor by @Ubadub in #23856

  • 🌐 [i18n-KO] Translated pad_truncation.mdx to Korean by @sim-so in #23823

  • Fix bug leading to missing token in GPTSanJapaneseTokenizer by @passaglia in #23883

  • Fix last instances of kbit -> quantized by @sgugger in #23797

  • fix(configuration_llama): add keys_to_ignore_at_inference to LlamaConfig by @calico-1226 in #23891

  • Fix Trainer when model is loaded on a different GPU by @sgugger in #23792

  • Support shared tensors by @thomasw21 in #23871

  • ensure banned_mask and indices in same device by @cauyxy in #23901

  • Unpin numba by @sanchit-gandhi in #23162

  • [bnb] add warning when no linear by @younesbelkada in #23894

  • fix: Replace add_prefix_space in get_prompt_ids with manual space for FastTokenizer compatibility by @connor-henderson in #23796

  • [RWKV] Fix RWKV 4bit by @younesbelkada in #23910

  • add conditional statement for auxiliary loss calculation by @harisankar95 in #23899

  • Raise error if loss can't be calculated - ViT MIM by @amyeroberts in #23872

  • Empty circleci config by @sgugger in #23913

  • Bug fix - flip_channel_order for channels first images by @amyeroberts in #23701

  • Re-enable squad test by @sgugger in #23912

  • Update the update metadata job to use upload_folder by @sgugger in #23917

  • [PushToHub] Make it possible to upload folders by @NielsRogge in #23920

  • Skip device placement for past key values in decoder models by @sgugger in #23919

  • [Flax Whisper] Update decode docstring by @sanchit-gandhi in #23908

  • Effectively allow encoder_outputs input to be a tuple in pix2struct by @fxmarty in #23932

  • Fix doc string nits by @sheonhan in #23929

  • Pin rhoknp by @sgugger in #23937

  • rename DocumentQuestionAnsweringTool parameter input to match docstring by @Adam-D-Lewis in #23939

  • Update stale.yml to use HuggingFaceBot by @LysandreJik in #23941

  • Make TF ESM inv_freq non-trainable like PyTorch by @Rocketknight1 in #23940

  • Revert "Update stale.yml to use HuggingFaceBot" by @LysandreJik in #23943

  • #23675 Registering Malay language by @soongbren in #23689

  • Modify device_map behavior when loading a model using from_pretrained by @SunMarc in #23922

  • use _make_causal_mask in clip/vit models by @kashif in #23942

  • Fix ReduceLROnPlateau object has no attribute 'get_last_lr' by @wasupandceacar in #23944

  • [MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 by @patrickvonplaten in #23813

  • add new mms functions to doc by @patrickvonplaten in #23954

  • 🌐 [i18n-KO] Translated object_detection.mdx to Korean by @KIHOON71 in #23164

  • Trainer: fixed evaluate raising KeyError for ReduceLROnPlateau by @claudius-kienle in #23952

  • [Whisper Tokenizer] Skip special tokens when decoding with timestamps by @sanchit-gandhi in #23945

  • Add an option to reduce compile() console spam by @Rocketknight1 in #23938

  • Added time-series blogs to the models by @elisim in #23857

  • Fix typo in doc comment of BitsAndBytesConfig by @ledyba in #23978

  • Skip test_multi_gpu_data_parallel_forward for MobileViTV2ModelTest by @ydshieh in #24017

  • Update README.md by @ydshieh in #24022

  • Auto tokenizer registration by @Bearnardd in #23965

  • expose safe_serialization argument in the pipeline API by @yessenzhar in #23775

  • Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder by @affjljoo3581 in #23976

  • TensorBoard callback no longer adds hparams by @bri25yu in #23999

  • 🌐 [i18n-KO] Translated tasks_explained.mdx to Korean by @0525hhgus in #23844

  • Fix MobileViTV2 checkpoint name by @ydshieh in #24018

  • Pin deepspeed to 0.9.2 for now by @ydshieh in #24024

  • 🌐 [i18n-KO] Translated language-modeling.mdx by @wonhyeongseo in #23969

  • 🌐 [i18n-KO] Translated bertology.mdx to Korean by @wonhyeongseo in #23968

  • Add check for tied parameters by @SunMarc in #24029

  • Fixing single candidate_label return. by @Narsil in #24023

  • Use TruncatedNormal from Keras initializers by @hvaara in #24036

  • Prevent ZeroDivisionError on trainer.evaluate if model and dataset are tiny by @tomaarsen in #24049

  • Modification of one text example file should trigger said test by @sgugger in #24051

  • Tiny fix for check_self_hosted_runner.py by @ydshieh in #24052

  • Reduce memory usage in TF building by @Rocketknight1 in #24046

  • Move TF building to an actual build() method by @Rocketknight1 in #23760

  • Use new parametrization based weight norm if available by @ezyang in #24030

  • bring back filtered_test_list_cross_tests.txt by @ydshieh in #24055

  • Fix device placement for model-parallelism in generate for encoder/de… by @sgugger in #24025

  • Remote code improvements by @sgugger in #23959

  • Generate: increase left-padding test atol by @gante in #23448

  • [Wav2Vec2] Fix torch srcipt by @patrickvonplaten in #24062

  • Add support for non-rust implemented tokenization for __getitem__ method. by @jacklanda in #24039

  • Support PEFT models when saving the model using trainer by @younesbelkada in #24073

  • [Hub] Add safe_serialization in push_to_hub by @younesbelkada in #24074

  • Fix is_optimum_neuron_available by @michaelbenayoun in #23961

  • [bnb] Fix bnb skip modules by @younesbelkada in #24043

  • Be nice to TF by @ydshieh in #24076

  • Make the TF dummies even smaller by @Rocketknight1 in #24071

  • [doc build] Use secrets by @mishig25 in #24079

  • Fix expected value in tests of the test fetcher by @sgugger in #24077

  • Update delete_doc_comment_trigger.yml by @mishig25 in #24084

  • Do not prepare lr scheduler as it as the right number of steps by @sgugger in #24088

  • Fix a tiny typo in WhisperForConditionalGeneration::generate docstring by @sadra-barikbin in #24045

  • [Trainer] Correct behavior of _load_best_model for PEFT models by @younesbelkada in #24103

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @shehanmunasinghe
    • Add swiftformer (#22686)
    • Add MobileViTv2 (#22820)
  • @TimDettmers
    • Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory (#23535)
    • 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (#23479)
    • Paged Optimizer + Lion Optimizer for Trainer (#23217)
  • @elisim
    • [Time-Series] Autoformer model (#21891)
    • Added time-series blogs to the models (#23857)
  • @KIHOON71
    • 🌐 [i18n-KO] Translated fast_tokenizers.mdx to Korean (#22956)
    • [i18n-KO] Translated video_classification.mdx to Korean (#23026)
    • 🌐 [i18n-KO] Translated object_detection.mdx to Korean (#23164)
  • @D-Roberts
    • Add TensorFlow implementation of EfficientFormer (#22620)
  • @soongbren
    • #23675 Registering Malay language (#23689)

v4.29.2

1 year ago

Fixes the package so non-Python files (like CUDA kernels) are properly included.