π€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A patch release was made for the following three commits:
A patch release was done for these two commits:
Falcon is a class of causal decoder-only models built by TII. The largest Falcon checkpoints have been trained on >=1T tokens of text, with a particular emphasis on the RefinedWeb corpus. They are made available under the Apache 2.0 license.
Falconβs architecture is modern and optimized for inference, with multi-query attention and support for efficient attention variants like FlashAttention. Both βbaseβ models trained only as causal language models as well as βinstructβ models that have received further fine-tuning are available.
Falcon
] Remove SDPA for falcon to support earlier versions of PyTorch (< 2.0) by @younesbelkada in #25947Code Llama, is a family of large language models for code based on Llama 2, providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
CodeLlama
] Add support for CodeLlama
by @ArthurZucker in #25740CodeLlama
] Fix CI by @ArthurZucker in #25890ViTDet reuses the ViT model architecture, adapted to object detection.
DINO v2 is the next iteration of the DINO model. It is added as a backbone class, allowing it to be re-used in downstream models.
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.
Refactor
] Move third-party related utility files into integrations/
folder π¨π¨π¨ by @younesbelkada in #25599Moves all third party libs (outside HF ecosystem) related utility files inside integrations/
instead of having them in transformers
directly.
In order to get the previous usage you should be changing your call to the following:
- from transformers.deepspeed import HfDeepSpeedConfig
+ from transformers.integrations import HfDeepSpeedConfig
TRANSFORMERS_TEST_BACKEND
by @vvvm23 in #25655SPM
] Patch spm
Llama and T5 by @ArthurZucker in #25656GPTNeo
] Add input_embeds functionality to gpt_neo Causal LM by @ArthurZucker in #25664utils/documentation_tests.txt
by @ydshieh in #25680pad_token
check condition by @ydshieh in #25685inputs_embeds
by @gante in #25687configuration_gpt2.py
by @susnato in #25676LlamaTokenizer
] make unk_token_length a property by @ArthurZucker in #25689test_batch_generation
for bloom by @ydshieh in #25718PEFT
] Fix peft version by @younesbelkada in #25710AutoGPTQ
] Add correct installation of GPTQ library + fix slow tests by @younesbelkada in #25713do_sample=False
when temperature=0.0
by @gante in #25722from_pretrained
] Simpler code for peft by @ArthurZucker in #25726from_pretrained
] Fix failing PEFT tests by @younesbelkada in #25733visual_question_answering.md
to Korean by @wonhyeongseo in #25679PEFT
] Fix PeftConfig save pretrained when calling add_adapter
by @younesbelkada in #25738Sentencepiece
] make sure legacy
do not require protobuf
by @ArthurZucker in #25684HammingDiversityLogitsProcessor
by @gante in #25756LlamaFamiliy
] add a tip about dtype by @ArthurZucker in #25794hidden_act
by @stas00 in #25787Docs
] More clarifications on BT + FA by @younesbelkada in #25823LlamaTokenizer
] tokenize
nits. by @ArthurZucker in #25793model_memory_anatomy.md
to Korean by @mjk0618 in #25755add_new_pipeline.md
to Korean by @heuristicwave in #25498community.md
to Korean by @sim-so in #25674Pop2Piano
checkpoints by @susnato in #25827generate()
return True
in can_generate()
by @gante in #25838generation_strategies.md
by @gante in #25874stage3_gather_16bit_weights_on_model_save=False
by @pacman100 in #25817TokenizerFast
] can_save_slow_tokenizer
as a property for when vocab_file
's folder was removed by @ArthurZucker in #25626InstructBlip
] FINAL Fix instructblip test by @younesbelkada in #25887setup.py
by @ydshieh in #25893is_tensor
by @sgugger in #25871ViTDet
by @ydshieh in #25913The following contributors have made significant changes to the library over the last release:
Patch release including several patches from v4.31.0, listed below:
The IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh
IDEFICS is the first open state-of-the-art visual language model at the 80B scale!
The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.
Blogpost: hf.co/blog/idefics Playground: HuggingFaceM4/idefics_playground
MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.
MPT
] Add MosaicML's MPT
model to transformers by @ArthurZucker & @younesbelkada in #24629GPTQ quantization is now supported in Transformers, through the optimum
library. The backend relies on the auto_gptq library, from which we use the GPTQ
and QuantLinear
classes.
See below for an example of the API, quantizing a model using the new GPTQConfig
configuration utility.
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer, group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)
Most models under TheBloke namespace with the suffix GPTQ
should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ
simply run (after installing latest optimum and auto-gptq libraries):
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration
A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers
: SpeechT5ForTextToSpeech
, MusicGen
and Bark
.
See below for an example:
from transformers import pipeline
classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")
audio = output["audio"]
sampling_rate = output["sampling_rate"]
Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.
A new task guide going into Visual Question Answering has been added to Transformers.
We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.
By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.
There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.
If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.
tasks/document_question_answering.md
to Korean by @jungnerd in #24588quicktour.md
by @wonhyeongseo in #24664serialization.md
by @wonhyeongseo in #24686testing.md
to Korean by @Sunmin0520 in #24900perf_train_cpu.md
to Korean by @seank021 in #24911<tf_xla>.md
to Korean by @54data in #24904perf_hardware.md
to Korean by @augustinLib in #24966hpo_train.md
to Korean by @harheem in #24968perf_infer_cpu.md
to Korean by @junejae in #24920transformers_agents.md
to Korean by @sim-so in #24881perf_infer_gpu_many.md
to Korean by @heuristicwave in #24943perf_infer_gpu_one.md
to Korean by @eenzeenee in #24978add_tensorflow_model.md
to Korean by @keonju2 in #25017perf_train_cpu_many.md
to Korean by @nuatmochoi in #24923add_new_model.md
to Korean by @mjk0618 in #24957model_summary.md
to Korean by @0525hhgus in #24625philosophy.md
to Korean by @TaeYupNoh in #25010perf_train_tpu_tf.md
to Korean by @0525hhgus in #25433Addition of input_data_format
argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.
import numpy as np
from transformers import ViTImageProcessor
img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")
torch.scaled_dot_product_attention
& Flash AttentionUsers are not aware that it is possible to force dispatch torch.scaled_dot_product_attention
method from torch
to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.
In a nutshell, one can just run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")
# convert the model to BetterTransformer
model.to_bettertransformer()
input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
to enable Flash-attenion in their model. However, this feature does not support padding yet.
Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass fsdp_transformer_layer_cls_to_wrap
as the code now use _no_split_modules
by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.
Trainer
classThe default optimizer in the Trainer
class has been updated to be adam_torch
rather than our own adam_hf
, as the official Torch optimizer is more robust and fixes some issues.
In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim
value in your TrainingArguments
.
adamw_hf
to adamw_torch
π¨π¨π¨ by @muellerzr in #25109There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.
The EfficientNetForImageClassification
model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.
In order to obtain previous results, pass the model logits through a softmax.
Some SPM models had issues with their management of added tokens. Namely the Llama
and T5
, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.
An option to obtain the previous behavior was added through the legacy
flag, as explained in the PR linked above.
SPM
] Finish fix spm models π¨π¨π¨ by @ArthurZucker in #25224use_cache=True
by @ydshieh in #24893test_model_parallelism
for FalconModel
by @ydshieh in #24914Llama2
] replace self.pretraining_tp
with self.config.pretraining_tp
by @younesbelkada in #24906image_processing_vilt.py
wrong default documented by @stas00 in #24931main_input_name
in src/transformers/keras_callbacks.py
by @ydshieh in #24916LogitsProcessor
class by @shauray8 in #24848RWKV
] Add Gradient Checkpointing support for RWKV by @younesbelkada in #24955Parameter.ds_numel
by @apoorvkh in #24942LlamaConfig
] Nit: pad token should be None by default by @ArthurZucker in #24958llama
tokenization doctest by @ydshieh in #24990bnb
] Add simple check for bnb import by @younesbelkada in #24995Llama
] remove persistent inv_freq
tensor by @ArthurZucker in #24998logging.py
] set default stderr
path if None
by @ArthurZucker in #25033TrainingArgs
to wandb.config
without sanitization. by @parambharat in #250358bit
] Fix 8bit corner case with Blip2 8bit by @younesbelkada in #25047RWKV
] Add note in doc on RwkvStoppingCriteria
by @ArthurZucker in #25055TF32
flag for PyTorch cuDNN backend by @XuehaiPan in #25075per_gpu_eval_batch_size
with per_device_eval_batch_size
in readme of multiple-choice task by @statelesshz in #25078generate
] Only warn users if the generation_config
's max_length
is set to the default value by @ArthurZucker in #25030ForSequenceClassification
] Support left
padding by @ArthurZucker in #24979TF
] Also apply patch to support left padding by @ArthurZucker in #25085test_model_is_small
by @connor-henderson in #25087PreTrainedTokenizerFast
] Keep properties from fast tokenizer by @ArthurZucker in #25053MusicgenForConditionalGeneration
tests by @ydshieh in #25091T5
, MT5
, UMT5
] Add [T5, MT5, UMT5]ForSequenceClassification by @sjrl in #24726PvtModelIntegrationTest::test_inference_fp16
by @ydshieh in #25106use_auth_token
-> token
by @ydshieh in #25083T5/LlamaTokenizer
] default legacy to None
to not always warn by @ArthurZucker in #25131MptConfig
] support from pretrained args by @ArthurZucker in #25116token
things by @ydshieh in #25146.push_to_hub
and cleanup get_full_repo_name
usage by @Wauplin in #25120use_auth_token
-> token
in example scripts by @ydshieh in #25167Mpt
] Fix mpt slow test by @younesbelkada in #25170InstructBlip
] Fix instructblip slow test by @younesbelkada in #25171_prepare_output_docstrings
by @ydshieh in #25202PreTrainedModel
] Wrap cuda
and to
method correctly by @younesbelkada in #25206all_model_classes
in FlaxBloomGenerationTest
by @ydshieh in #25211pipeline
] revisit device check for pipeline by @younesbelkada in #25207Pix2Struct
] Fix pix2struct cross attention by @younesbelkada in #25200Docs
/quantization
] Clearer explanation on how things works under the hood. + remove outdated info by @younesbelkada in #25216MPT
] Add require_bitsandbytes
on MPT integration tests by @younesbelkada in #25201Detr
] Fix detr BatchNorm replacement issue by @younesbelkada in #25230token
arugment in example scripts by @ydshieh in #25172pytest_options={"rA": None}
in CI by @ydshieh in #25263num_hidden_layers=2
πππ by @ydshieh in #25266pytest_num_workers=8
for torch/tf jobs by @ydshieh in #25274report_to
logging integrations in docstring by @tomaarsen in #25281bark
could have tiny model by @ydshieh in #25290trust_remote_code
in example scripts by @Jackmin801 in #25248Repository
to upload_folder
by @sgugger in #25095NoRepeatNGramLogitsProcessor
Example for LogitsProcessor
class by @Rishab26 in #25186torch.compile()
Β for vision models by @merveenoyan in #24748test_model_parallelism
by @ydshieh in #25359token
in example template by @ydshieh in #25351torch_job
worker(s) crashing by @ydshieh in #25374token
by @ydshieh in #25382OneFormerModelTest.test_model_with_labels
by @ydshieh in #25383TopPLogitsWarper
by @chiral-carbon in #25361device_map
is passed by @gante in #25413torch.compile()
docs by @merveenoyan in #25432examples
to tests to run when setup.py
is modified by @ydshieh in #25437main
on PRs/branches if setup.py
is not modified by @ydshieh in #25445main
on PRs/branches" by @ydshieh in #25466auxiliary_head
is None
in UperNetPreTrainedModel
by @mmurray in #25514MaskFormerModelIntegrationTest
OOM by @ydshieh in #25544torch.fx
tests on nightly CI by @ydshieh in #25549test_onnx_runtime_optimize
for now by @ydshieh in #25560Docs
] Fix un-rendered images by @younesbelkada in #25561TRANSFORMERS_TEST_DEVICE
by @vvvm23 in #25506test_beam_search_xla_generate_simple
for T5
by @ydshieh in #25566resize_embedding
] Introduce pad_to_multiple_of
and guidance by @ArthurZucker in #25088SwitchTransformers
] Remove unused module by @ArthurZucker in #25427NllbMoe
] Update code to properly support loss computation by @ArthurZucker in #25429Tests
] Fix failing 8bit test by @younesbelkada in #25564test_contrastive_generate
for TFXLNet
by @ydshieh in #25574Docs
/ BetterTransformer
] Added more details about flash attention + SDPA by @younesbelkada in #25265.cuda
with .to(torch_device)
in tests by @vvvm23 in #25571split_special_tokens
] Add support for split_special_tokens
argument to encode by @ArthurZucker in #25081Llama
] remove prompt and fix prefix finetuning by @ArthurZucker in #25565TokenizerFast
] Fix setting prefix space in init by @ArthurZucker in #25563resize_token_embeddings
by @SunMarc in #25596The following contributors have made significant changes to the library over the last release:
quicktour.md
(#24664)serialization.md
(#24686)testing.md
to Korean (#24900)T5
, MT5
, UMT5
] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)trust_remote_code
in example scripts (#25248)add_new_model.md
to Korean (#24957)Llama 2 was proposed in LLaMA: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.
The MusicGen model was proposed in the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre DΓ©fossez.
MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.
Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.
Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.
The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli
The EnCodec neural codec model was proposed in High Fidelity Neural Audio Compression by Alexandre DΓ©fossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.
The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.
The UMT5 model was proposed in UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
Umt5
] Add google's umt5 to transformers
by @ArthurZucker in #24477The MRA model was proposed in Multi Resolution Analysis (MRA) for Approximate Self-Attention by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.
The Vivit model was proposed in ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario LuΔiΔ, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.
The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.
The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.
This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:
Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.
Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.
This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.
A new auto model is added, AutoModelForTextEncoding
. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.
Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them. (enfin Γ§a The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:
Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.
T5Tokenize
] Fix T5 family tokenizersβ οΈβ οΈ by @ArthurZucker in #24565add trust_remote_code option to CLI download cmd by @radames in #24097
Fix typo in Llama docstrings by @Kh4L in #24020
Avoid GPT-2
daily CI job OOM (in TF tests) by @ydshieh in #24106
[Lllama] Update tokenization code to ensure parsing of the special tokens [core] by @ArthurZucker in #24042
PLAM => PaLM by @xingener in #24129
[bnb
] Fix bnb config json serialization by @younesbelkada in #24137
Correctly build models and import call_context for older TF versions by @Rocketknight1 in #24138
Generate: PT's top_p
enforces min_tokens_to_keep
when it is 1
by @gante in #24111
fix bugs with trainer by @pacman100 in #24134
Fix TF Rag OOM issue by @ydshieh in #24122
Fix SAM OOM issue on CI by @ydshieh in #24125
Fix XGLM OOM on CI by @ydshieh in #24123
[SAM
] Fix sam slow test by @younesbelkada in #24140
[lamaTokenizerFast] Update documentation by @ArthurZucker in #24132
[BlenderBotSmall] Update doc example by @ArthurZucker in #24092
Fix Pipeline CI OOM issue by @ydshieh in #24124
[documentation] grammatical fixes in image_classification.mdx by @LiamSwayne in #24141
Fix typo in streamers.py by @freddiev4 in #24144
[tests] fix bitsandbytes import issue by @stas00 in #24151
Avoid OOM in doctest CI by @ydshieh in #24139
Fix Wav2Vec2
CI OOM by @ydshieh in #24190
Fix push to hub by @NielsRogge in #24187
Change ProgressCallback to use dynamic_ncols=True by @gmlwns2000 in #24101
[i18n]Translated "attention.mdx" to korean by @kihoon71 in #23878
Generate: force caching on the main model, in assisted generation by @gante in #24177
Fix device issue in OpenLlamaModelTest::test_model_parallelism
by @ydshieh in #24195
Update GPTNeoXLanguageGenerationTest
by @ydshieh in #24193
typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by @zsj9509 in #24184
Generate: detect special architectures when loaded from PEFT by @gante in #24198
π [i18n-KO] Translated tasks_summary.mdx to Korean by @kihoon71 in #23977
π¨π¨π¨ Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests π¨π¨π¨ by @muellerzr in #24028
Fix _load_pretrained_model
by @SunMarc in #24200
Fix steps bugs in no trainer examples by @Ethan-yt in #24197
Skip RWKV test in past CI by @ydshieh in #24204
Remove unnecessary aten::to overhead in llama by @fxmarty in #24203
Update WhisperForAudioClassification
doc example by @ydshieh in #24188
Finish dataloader integration by @muellerzr in #24201
Add the number of model
test failures to slack CI report by @ydshieh in #24207
fix: TextIteratorStreamer cannot work with pipeline by @yuanwu2017 in #23641
Update (TF)SamModelIntegrationTest
by @ydshieh in #24199
Improving error message when using use_safetensors=True
. by @Narsil in #24232
Safely import pytest in testing_utils.py by @amyeroberts in #24241
fix overflow when training mDeberta in fp16 by @sjrl in #24116
deprecate use_mps_device
by @pacman100 in #24239
Tied params cleanup by @sgugger in #24211
[Time Series] use mean scaler when scaling is a boolean True by @kashif in #24237
TF: standardize test_model_common_attributes
for language models by @gante in #23457
Generate: GenerationConfig can overwrite attributes at from_pretrained time by @gante in #24238
Add torch >=1.12
requirement for Tapas
by @ydshieh in #24251
Update urls in warnings for rich rendering by @IvanReznikov in #24136
Fix how we detect the TF package by @Rocketknight1 in #24255
Stop storing references to bound methods via tf.function by @Rocketknight1 in #24146
Skip GPT-J
fx tests for torch < 1.12 by @ydshieh in #24256
docs wrt using accelerate launcher with trainer by @pacman100 in #24250
update FSDP save and load logic by @pacman100 in #24249
Fix URL in comment for contrastive loss function by @taepd in #24271
QA doc: import torch before it is used by @ByronHsu in #24228
Skip some TQAPipelineTests
tests in past CI by @ydshieh in #24267
TF: CTRL with native embedding layers by @gante in #23456
Adapt Wav2Vec2 conversion for MMS lang identification by @patrickvonplaten in #24234
Update check of core deps by @sgugger in #24277
Pix2StructImageProcessor
requires torch>=1.11.0
by @ydshieh in #24270
Fix Debertav2 embed_proj by @WissamAntoun in #24205
Clean up old Accelerate checks by @sgugger in #24279
Fix bug in slow tokenizer conversion, make it a lot faster by @stephantul in #24266
Fix check_config_attributes
: check all configuration classes by @ydshieh in #24231
Fix LLaMa beam search when using parallelize by @FeiWang96 in #24224
remove unused is_decoder parameter in DetrAttention by @JayL0321 in #24226
Split common test from core tests by @sgugger in #24284
[fix] bug in BatchEncoding.getitem by @flybird1111 in #24293
Fix image segmentation tool bug by @amyeroberts in #23897
[Docs] Improve docs for MMS loading of other languages by @patrickvonplaten in #24292
Update README_zh-hans.md by @CooperFu in #24181
deepspeed init during eval fix by @pacman100 in #24298
[EnCodec] Changes for 32kHz ckpt by @sanchit-gandhi in #24296
[Docs] Fix the paper URL for MMS model by @hitchhicker in #24302
Update tokenizer_summary.mdx (grammar) by @belladoreai in #24286
Beam search type by @jprivera44 in #24288
Make can_generate
as class method by @ydshieh in #24299
Update test versions on README.md by @sqali in #24307
[SwitchTransformers
] Fix return values by @ArthurZucker in #24300
Fix functional TF Whisper and modernize tests by @Rocketknight1 in #24301
Big TF test cleanup by @Rocketknight1 in #24282
Fix ner average grouping with no groups by @Narsil in #24319
Fix ImageGPT doc example by @amyeroberts in #24317
Add test for proper TF input signatures by @Rocketknight1 in #24320
Adding ddp_broadcast_buffers argument to Trainer by @TevenLeScao in #24326
error bug on saving distributed optim state when using data parallel by @xshaun in #24108
π [i18n-KO] Fixed tutorial/preprocessing.mdx
by @sim-so in #24156
pin apex
to a speicifc commit (for DeepSpeed CI docker image) by @ydshieh in #24351
byebye Hub connection timeout by @ydshieh in #24350
Clean up disk sapce during docker image build for transformers-pytorch-gpu
by @ydshieh in #24346
Fix KerasMetricCallback
: pass generate_kwargs
even if use_xla_generation
is False by @Kripner in #24333
Fix device issue in SwitchTransformers
by @ydshieh in #24352
Update MMS integration docs by @vineelpratap in #24311
Make AutoFormer
work with previous torch version by @ydshieh in #24357
Fix ImageGPT doctest by @amyeroberts in #24353
Fix link to documentation in Install from Source by @SoyGema in #24336
docs: add BentoML to awesome-transformers by @aarnphm in #24344
[Doc Fix] Fix model name path in the transformers doc for AutoClasses by @riteshghorse in #24329
Fix the order in GPTNeo
's docstring by @qgallouedec in #24358
Respect explicitly set framework parameter in pipeline by @denis-ismailaj in #24322
Allow passing kwargs through to TFBertTokenizer by @Rocketknight1 in #24324
Fix resuming PeftModel checkpoints in Trainer by @llohann-speranca in #24274
TensorFlow CI fixes by @Rocketknight1 in #24360
Update tiny models for pipeline testing. by @ydshieh in #24364
[modelcard] add audio classification to task list by @sanchit-gandhi in #24363
[Whisper] Make tests faster by @sanchit-gandhi in #24105
Rename test to be more accurate by @sgugger in #24374
Add a check in ImageToTextPipeline._forward
by @ydshieh in #24373
[Tokenizer doc] Clarification about add_prefix_space
by @ArthurZucker in #24368
style: add BitsAndBytesConfig repr function by @aarnphm in #24331
Better test name and enable pipeline test for pix2struct
by @ydshieh in #24377
Skip a tapas (tokenization) test in past CI by @ydshieh in #24378
[Whisper Docs] Nits by @ArthurZucker in #24367
[GPTNeoX] Nit in config by @ArthurZucker in #24349
[Wav2Vec2 - MMS] Correct directly loading adapters weights by @patrickvonplaten in #24335
Migrate doc files to Markdown. by @sgugger in #24376
Update deprecated torch.ger by @kit1980 in #24387
[docs] Fix NLLB-MoE links by @stevhliu in #24388
Add ffmpeg
for doc_test_job
on CircleCI by @ydshieh in #24397
byebye Hub connection timeout - Recast by @ydshieh in #24399
fix type annotation for debug arg by @Bearnardd in #24033
[Trainer] Fix optimizer step on PyTorch TPU by @cowanmeg in #24389
Fix gradient checkpointing + fp16 autocast for most models by @younesbelkada in #24247
Clean up dist import by @muellerzr in #24402
Check auto mappings could be imported via from transformers
by @ydshieh in #24400
Remove redundant code from TrainingArgs by @muellerzr in #24401
Explicit arguments in from_pretrained
by @ydshieh in #24306
[ASR pipeline] Check for torchaudio by @sanchit-gandhi in #23953
TF safetensors reduced mem usage by @Rocketknight1 in #24404
Skip test_conditional_generation_pt_pix2struct
in Past CI (torch < 1.11) by @ydshieh in #24417
[bnb
]Β Fix bnb serialization issue with new release by @younesbelkada in #24416
Revert "Fix gradient checkpointing + fp16 autocast for most models" by @younesbelkada in #24420
Fix save_cache
version in config.yml
by @ydshieh in #24419
Update RayTune doc link for Hyperparameter tuning by @JoshuaEPSamuel in #24422
TF CI fix for Segformer by @Rocketknight1 in #24426
Refactor hyperparameter search backends by @alexmojaki in #24384
Clarify batch size displayed when using DataParallel by @sgugger in #24430
Save site-packages
as cache in CircleCI job by @ydshieh in #24424
[llama] Fix comments in weights converter by @weimingzha0 in #24436
[Trainer
] Fix .to
call on 4bit models by @younesbelkada in #24444
fix the grad_acc issue at epoch boundaries by @pacman100 in #24415
Replace python random with torch.rand to enable dynamo.export by @BowenBao in #24434
Fix typo by @siryuon in #24440
Fix some TFWhisperModelIntegrationTests
by @ydshieh in #24428
fixes issue when saving fsdp via accelerate's FSDP plugin by @pacman100 in #24446
Allow dict input for audio classification pipeline by @sanchit-gandhi in #23445
Update JukeboxConfig.from_pretrained
by @ydshieh in #24443
Improved keras imports by @Rocketknight1 in #24448
add missing alignment_heads to Whisper integration test by @hollance in #24487
Fix tpu_metrics_debug by @cowanmeg in #24452
Update AlbertModel type annotation by @amyeroberts in #24450
[pipeline
] Fix str device issue by @younesbelkada in #24396
when resume from peft checkpoint, the model should be trainable by @sywangyi in #24463
deepspeed z1/z2 state dict fix by @pacman100 in #24489
Update InstructBlipModelIntegrationTest
by @ydshieh in #24490
Update token_classification.md by @condor-cp in #24484
Add support for for loops in python interpreter by @sgugger in #24429
[InstructBlip
] Add accelerate support for instructblip by @younesbelkada in #24488
Compute dropout_probability
only in training mode by @ydshieh in #24486
Fix 'local_rank' AttiributeError in Trainer class by @mocobeta in #24297
Compute dropout_probability
only in training mode (SpeechT5) by @ydshieh in #24498
Fix link in utils by @SoyGema in #24501
π¨π¨ Fix group beam search by @hukuda222 in #24407
Generate: group_beam_search
requires diversity_penalty>0.0
by @gante in #24456
Generate: min_tokens_to_keep
has to be >= 1
by @gante in #24453
Fix TypeError: Object of type int64 is not JSON serializable by @xiaoli in #24340
Fix poor past ci by @ydshieh in #24485
π [i18n-KO] Translated tflite.mdx
to Korean by @0525hhgus in #24435
use accelerate autocast in jit eval path, since mix precision logic is⦠by @sywangyi in #24460
Update hyperparameter_search.py by @pacman100 in #24515
[T5
] Add T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24481
set model to training mode before accelerate.prepare by @sywangyi in #24520
Update huggingface_hub
commit sha by @ydshieh in #24527
Find module name in an OS-agnostic fashion by @sgugger in #24526
Fix LR scheduler based on bs from auto bs finder by @muellerzr in #24521
[Mask2Former] Remove SwinConfig by @NielsRogge in #24259
Allow backbones not in backbones_supported - Maskformer Mask2Former by @amyeroberts in #24532
Fix Typo by @tony9402 in #24530
Finishing tidying keys to ignore on load by @sgugger in #24535
Add bitsandbytes support for gpt2 models by @DarioSucic in #24504
β οΈ Time to say goodbye to py37 by @ydshieh in #24091
Unpin DeepSpeed and require DS >= 0.9.3 by @ydshieh in #24541
Allow for warn_only selection in enable_full_determinism by @Frank995 in #24496
Fix typing annotations for FSDP and DeepSpeed in TrainingArguments by @mryab in #24549
Update PT/TF weight conversion after #24030 by @ydshieh in #24547
Update EncodecIntegrationTest
by @ydshieh in #24553
[gpt2-int8
] Add gpt2-xl int8 test by @younesbelkada in #24543
Fix processor init bug if image processor undefined by @amyeroberts in #24554
[InstructBlip
] Add instruct blip int8 test by @younesbelkada in #24555
Update PT/Flax weight conversion after #24030 by @ydshieh in #24556
Make PT/Flax tests could be run on GPU by @ydshieh in #24557
Update masked_language_modeling.md by @condor-cp in #24560
Fixed OwlViTModel inplace operations by @pasqualedem in #24529
Update old existing feature extractor references by @amyeroberts in #24552
Fix Typo by @tony9402 in #24559
Fix annotations by @tony9402 in #24571
Docs: 4 bit doc corrections by @gante in #24572
Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" by @sgugger in #24574
Update some torchscript tests after #24505 by @ydshieh in #24566
Removal of deprecated vision methods and specify deprecation versions by @amyeroberts in #24570
Fix ESM models buffers by @sgugger in #24576
Check all objects are equally in the main __init__
file by @ydshieh in #24573
Fix annotations by @tony9402 in #24582
fix peft ckpts not being pushed to hub by @pacman100 in #24578
Udate link to RunHouse hardware setup documentation. by @BioGeek in #24590
Show a warning for missing attention masks when pad_token_id is not None by @hackyon in #24510
Make (TF) CI faster (test only a subset of model classes) by @ydshieh in #24592
Speed up TF tests by reducing hidden layer counts by @Rocketknight1 in #24595
[several models] improve readability by @stas00 in #24585
Use protobuf 4 by @ydshieh in #24599
Limit Pydantic to V1 in dependencies by @lig in #24596
π [i18n-KO] Translated perplexity.mdx
to Korean by @HanNayeoniee in #23850
[Time-Series] Added blog-post to tips by @elisim in #24482
Pin Pillow
for now by @ydshieh in #24633
Fix loading dataset docs link in run_translation.py example by @SoyGema in #24594
Generate: multi-device support for contrastive search by @gante in #24635
Generate: force cache with inputs_embeds
forwarding by @gante in #24639
precompiled_charsmap checking before adding to the normalizers' list for XLNetTokenizerFast conversion. by @shahad-mahmud in #24618
Fix audio feature extractor deps by @sanchit-gandhi in #24636
llama fp16 torch.max bug fix by @prathikr in #24561
documentation_tests.txt - sort filenames alphabetically by @amyeroberts in #24647
Update warning messages reffering to post_process_object_detection by @rafaelpadilla in #24649
Add finetuned_from
property in the autogenerated model card by @sgugger in #24528
Make warning disappear for remote code in pipelines by @sgugger in #24603
Fix EncodecModelTest::test_multi_gpu_data_parallel_forward
by @ydshieh in #24663
Fix VisionTextDualEncoderIntegrationTest
by @ydshieh in #24661
Add is_torch_mps_available
function to utils by @NripeshN in #24660
Unpin huggingface_hub
by @ydshieh in #24667
Fix model referenced and results in documentation. Model mentioned was inaccessible by @rafaelpadilla in #24609
Add Nucleotide Transformer notebooks and restructure notebook list by @Rocketknight1 in #24669
LlamaTokenizer should be picklable by @icyblade in #24681
Add dropouts to GPT-NeoX by @ZHAOTING in #24680
DeepSpeed/FSDP ckpt saving utils fixes and FSDP training args fixes by @pacman100 in #24591
Avoid import sentencepiece_model_pb2
in utils.__init__.py
by @ydshieh in #24689
Fix integration with Accelerate and failing test by @muellerzr in #24691
[MT5
] Fix CONFIG_MAPPING issue leading it to load umt5 class by @ArthurZucker in #24678
Fix flaky test_for_warning_if_padding_and_no_attention_mask
by @ydshieh in #24706
Whisper: fix prompted max length by @gante in #24666
Enable conversational
pipeline for GPTSw3Tokenizer
by @saattrupdan in #24648
[T5
] Adding model_parallel = False to T5ForQuestionAnswering
and MT5ForQuestionAnswering
by @sjrl in #24684
Docs: change some input_ids
doc reference from BertTokenizer
to AutoTokenizer
by @gante in #24730
add link to accelerate doc by @SunMarc in #24601
[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words by @ArthurZucker in #24622
Fix typo in LocalAgent by @jamartin9 in #24736
fix: Text splitting in the BasicTokenizer by @connor-henderson in #22280
Docs: add kwargs
type to fix formatting by @gante in #24733
add gradient checkpointing for distilbert by @jordane95 in #24719
Skip keys not in the state dict when finding mismatched weights by @sgugger in #24749
Fix non-deterministic Megatron-LM checkpoint name by @janEbert in #24674
[InstructBLIP] Fix bos token of LLaMa checkpoints by @NielsRogge in #24492
Skip some slow tests for doctesting in PRs (Circle)CI by @ydshieh in #24753
Fix lr scheduler not being reset on reruns by @muellerzr in #24758
:bug: Handle empty gen_kwargs for seq2seq trainer prediction_step function by @gkumbhat in #24759
Allow existing configs to be registered by @sgugger in #24760
Unpin protobuf in docker file (for daily CI) by @ydshieh in #24761
Fix eval_accumulation_steps leading to incorrect metrics by @muellerzr in #24756
Add MobileVitV2 to doctests by @amyeroberts in #24771
Docs: Update logit processors call docs by @gante in #24729
Replacement of 20 asserts with exceptions by @Baukebrenninkmeijer in #24757
Update default values of bos/eos token ids in CLIPTextConfig
by @ydshieh in #24773
Fix pad across processes dim in trainer and not being able to set the timeout by @muellerzr in #24775
gpt-bigcode: avoid zero_
to support Core ML by @pcuenca in #24755
Remove WWT from README by @LysandreJik in #24672
Rm duplicate pad_across_processes by @muellerzr in #24780
Revert "Unpin protobuf in docker file (for daily CI)" by @ydshieh in #24800
Removing unnecessary device=device
in modeling_llama.py by @Liyang90 in #24696
[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" by @SeongBeomLEE in #24769
[DOC] Clarify relationshi load_best_model_at_end and save_total_limit by @BramVanroy in #24614
Upgrade jax/jaxlib/flax pin versions by @ydshieh in #24791
Fix MobileVitV2 doctest checkpoint by @amyeroberts in #24805
Skip torchscript tests for MusicgenForConditionalGeneration
by @ydshieh in #24782
Generate: add SequenceBiasLogitsProcessor by @gante in #24334
Add accelerate version in transformers-cli env by @amyeroberts in #24806
Fix typo 'submosules' by @dymil in #24809
Remove Falcon docs for the release until TGI is ready by @Rocketknight1 in #24808
Update setup.py to be compatible with pipenv by @georgiemathews in #24789
Use _BaseAutoModelClass's register method by @fadynakhla in #24810
Run hub tests by @sgugger in #24807
Copy code when using local trust remote code by @sgugger in #24785
Fixing double use_auth_token.pop
(preventing private models from being visible). by @Narsil in #24812
set correct model input names for gptsw3tokenizer by @DarioSucic in #24788
Check models used for common tests are small by @sgugger in #24824
[π Docs] Fixed Incorrect Migration Link by @kadirnar in #24793
deprecate sharded_ddp
training argument by @statelesshz in #24825
πΒ [i18n-KO] TranslatedΒ custom_tools.mdx
to Korean by @sim-so in #24580
Remove unused code in GPT-Neo by @namespace-Pt in #24826
Add Multimodal heading and Document question answering in task_summary.mdx by @y3sar in #23318
Fix is_vision_available
by @ydshieh in #24853
Fix comments for _merge_heads
by @bofenghuang in #24855
fix broken links in READMEs by @younesbelkada in #24861
Add TAPEX to the list of deprecated models by @sgugger in #24859
Fix token pass by @sgugger in #24862
The following contributors have made significant changes to the library over the last release:
tutorial/preprocessing.mdx
(#24156)custom_tools.mdx
to Korean (#24580)Transformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of transformers
and we have decided to create an awesome-transformers page to do just that.
We accept PRs to add projects to the list!
By leveraging the bitsandbytes
library by @TimDettmers, we add 4-bit support to transformers
models!
The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:
transformers
instead of relying on APIs.AzureOpenAiAgent
class to support Azure OpenAI agents.The safetensors
library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).
It has now become a core dependency of transformers
.
safetensors
a core dependency. by @Narsil in #23254The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called βSwiftFormerβ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2Γ faster compared to MobileViT-v2.
This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.
MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.
PerSAM proposes a minimal modification to SAM to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.
We add support for loading timm
weights within the AutoBackbone
API in transformers
. timm
models can be instantiated through the TimmBackbone
class, and then used with any vision model that needs a backbone.
We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.
A major rework of the internals of the Trainer
is underway, leveraging accelerate
instead of redefining them in transformers
. This should unify both framework and lead to increased interoperability and more efficient development.
accelerator.prepare
by @pacman100 in #23914chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759
Fix link displayed for custom tools by @sgugger in #23274
Remove missplaced test file by @sgugger in #23275
Bring back the PR Refactor doctests + add CI
to main
by @ydshieh in #23271
[gpt
] Gpt2 fix half precision causal mask by @younesbelkada in #23256
Temporary tolerance fix for flaky whipser PT-TF equiv. test by @amyeroberts in #23257
Add top_k
argument to post-process of conditional/deformable-DETR by @CreatlV in #22787
transformers-cli
-> huggingface-cli
by @AlpinDale in #23276
Temporarily increase tol for PT-FLAX whisper tests by @amyeroberts in #23288
Added missing " in CHAT_PROMPT_TEMPLATE by @galatolofederico in #23287
Update custom_tools.mdx: fix link by @mishig25 in #23292
Update transformers_agents.mdx by @mishig25 in #23289
Convert numpy arrays to lists before saving the evaluation metrics as json by @harisankar95 in #23268
Fix doctest files fetch issue by @ydshieh in #23277
skip test_run_squad_no_trainer
for now by @ydshieh in #23302
Better check for packages availability by @apbard in #23163
Add gradient_checkpointing parameter to FlaxWhisperEncoder by @raghavanone in #23300
Agents extras by @LysandreJik in #23301
Fix broken links in the agent docs by @sgugger in #23297
Fix typo in gradio-tools docs by @freddyaboulton in #23305
Fix image segmentation tool test by @sgugger in #23306
unpin tf prob by @ydshieh in #23293
Revert "search buffers for dtype" by @sgugger in #23308
Remove LanguageIdentificationTool
in __init__.py
as we don't have it yet by @ydshieh in #23326
Fix docker image (caused by tensorflow_text
) by @ydshieh in #23321
Compute the mask in-place, with less memory reads, and on CUDA on XLNetLMHeadModel
by @lezcano in #23332
Only add files with modification outside doc blocks by @ydshieh in #23327
[docs] Fix Agents and Tools docstring by @stevhliu in #23313
OR am I crazy? by @hwuebben in #23295
Handle padding warning in generation when using inputs_embeds
by @zrthxn in #23131
replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by @susnato in #23273
Use cu118 with cudnn >= 8.6 in docker file by @ydshieh in #23339
Removing one of the twice defined position_embeddings in LongFormer by @GregorySenay in #23343
Fix issue introduced in PR #23163 by @ydshieh in #23363
Typo suggestion by @richardachen in #23360
Fix some is_xxx_available
by @ydshieh in #23365
Fix BigBirdForMaskedLM
doctest by @ydshieh in #23369
Fix OwlViTForObjectDetection.image_guided_detection
doc example by @ydshieh in #23370
Revert "Only add files with modification outside doc blocks" by @ydshieh in #23371
[Bugfix] OPTDecoderLayer
does not return attentions when gradient_checkpointing
and training
is enabled. by @gmlwns2000 in #23367
Skip failing AlignModelTest::test_multi_gpu_data_parallel_forward
by @ydshieh in #23374
Fix test typos - audio feature extractors by @LWprogramming in #23310
Added type hints for Graphormer
pytorch version by @dewasahu2003 in #23073
Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by @gojiteji in #23356
Use mkstemp
to replace deprecated mktemp
by @ready-research in #23372
Fix RwkvModel
by @ydshieh in #23392
Update test_batched_inference_image_captioning_conditioned
by @ydshieh in #23391
OPT/BioGPT: Improved attention mask shape exception by @gante in #23270
Fix chat prompt in HFAgent by @IvanSedykh in #23335
π [i18n-KO] Translated asr.mdx
to Korean by @sim-so in #23106
Minor fixes in transformers-tools by @Wauplin in #23364
[Pix2Struct
] Add conditional generation on docstring example by @younesbelkada in #23399
Generate: faster can_generate
check on TF and Flax by @gante in #23398
[AutoModel] fix torch_dtype=auto
in from_pretrained
by @stas00 in #23379
Docs: add link to assisted generation blog post by @gante in #23397
Build with non Python files by @sgugger in #23405
Generate: add test to check KV format by @gante in #23403
Replace appends with list comprehension. by @ttsugriy in #23359
Fix smdistributed check by @sgugger in #23414
Why crash the whole run when HFHub gives a 50x error? by @ropoctl in #23320
Run doctest (in PRs) only when some doc example(s) are modified by @ydshieh in #23387
Update ConvNextV2ModelIntegrationTest::test_inference_image_classification_head
by @ydshieh in #23402
Fix a typo in HfAgent docstring. by @ttsugriy in #23420
Use dict.items to avoid unnecessary lookups. by @ttsugriy in #23415
Update 3 docker files to use cu118 by @ydshieh in #23406
[SAM
] fix sam slow test by @younesbelkada in #23376
Return early once stop token is found. by @ttsugriy in #23421
[Reland] search model buffers for dtype as the last resort by @cyyever in #23319
Add Missing tokenization test [electra] by @IMvision12 in #22997
Small fixes and link in the README by @LysandreJik in #23428
TF: embeddings out of bounds check factored into function by @gante in #23427
Update Bigbird Pegasus tests by @ydshieh in #23431
Encoder-Decoder: add informative exception when the decoder is not compatible by @gante in #23426
Remove hardcoded prints in Trainer by @hugoabonizio in #23432
Fix device issue in SwiftFormerModelIntegrationTest::test_inference_image_classification_head
by @ydshieh in #23435
Generate: skip left-padding tests on old models by @gante in #23437
remove unnecessary print in gpt neox sequence classifier by @cfhammill in #23433
π [i18n-KO] Translated tasks/zero_shot_object_detection.mdx
to Korean by @HanNayeoniee in #23430
Fix (skip) a pipeline test for RwkvModel
by @ydshieh in #23444
Fix DecisionTransformerConfig doctring by @joaoareis in #23450
TF: GPT2 with native embedding layers by @gante in #23436
Make RwkvModel
accept attention_mask
but discard it internally by @ydshieh in #23442
Less flaky test_assisted_decoding_matches_greedy_search
by @ydshieh in #23451
Update tiny models and pipeline tests by @ydshieh in #23446
Properly guard PyTorch stuff by @sgugger in #23452
Add an option to log result from the Agent by @sgugger in #23454
Clean up CUDA kernels by @sgugger in #23455
fix bug in group_texts function, that was inserting short batches by @BodaSadalla98 in #23429
feat: Whisper prompting by @connor-henderson in #22496
README: Fix affiliation for MEGA by @julien-c in #23394
Remove .data usages in optimizations.py by @alanwaketan in #23417
TF port of the Segment Anything Model (SAM) by @Rocketknight1 in #22970
[RWKV
] Rwkv fix for 8bit inference by @younesbelkada in #23468
Use config to set name and description if not present by @sgugger in #23473
Fix transformers
' DeepSpeed CI job by @ydshieh in #23463
Fix PretrainedConfig min_length
docstring by @joaoareis in #23471
Fix: Change tensors to integers for torch.dynamo and torch.compile compatibility by @loevlie in #23475
[Blip
] Remove redundant shift right by @younesbelkada in #23153
Fix DeepSpeed stuff in the nightly CI by @ydshieh in #23478
Fix confusing transformers
installation in CI by @ydshieh in #23465
Fix tests/repo_utils/test_get_test_info.py
by @ydshieh in #23485
Debug example code for MegaForCausalLM by @Tylersuard in #23382
Remove erroneous img
closing tag by @xenova in #23646
Fix tensor device while attention_mask is not None by @zspo in #23538
Fix accelerate logger bug by @younesbelkada in #23650
Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory by @TimDettmers in #23535
Fix wav2vec2 is_batched check to include 2-D numpy arrays by @LWprogramming in #23223
changing the requirements to a cpu torch version that works by @sshahrokhi in #23483
Fix SAM tests and use smaller checkpoints by @Rocketknight1 in #23656
Update workflow files by @ydshieh in #23658
small fix to remove unused eos in processor when it's not used. by @Narsil in #23408
Fix typo in a parameter name for open llama model by @aaalexlit in #23637
Fix PyTorch SAM tests by @ydshieh in #23682
π [i18n-KO] Translated tasks/monocular_depth_estimation.mdx
to Korean by @HanNayeoniee in #23621
Fix a BridgeTower
test by @ydshieh in #23694
[SAM
]Β Fixes pipeline and adds a dummy pipeline test by @younesbelkada in #23684
TF version compatibility fixes by @Rocketknight1 in #23663
[Blip
] Fix blip doctest by @younesbelkada in #23698
is_batched fix for remaining 2-D numpy arrays by @LWprogramming in #23309
Skip TFCvtModelTest::test_keras_fit_mixed_precision
for now by @ydshieh in #23699
fix: load_best_model_at_end error when load_in_8bit is True by @dkqkxx in #23443
Fix some docs what layerdrop does by @zspo in #23691
add GPTJ/bloom/llama/opt into model list and enhance the jit support by @sywangyi in #23291
Paged Optimizer + Lion Optimizer for Trainer by @TimDettmers in #23217
Export to ONNX doc refocused on using optimum, added tflite by @MKhalusova in #23434
fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT by @uchuhimo in #23683
fix gptj could not jit.trace in GPU by @sywangyi in #23317
Better TF docstring types by @Rocketknight1 in #23477
Minor awesome-transformers.md fixes by @pagarsky in #23453
TF SAM memory reduction by @Rocketknight1 in #23732
fix: delete duplicate sentences in document_question_answering.mdx
by @jungnerd in #23735
fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation by @connor-henderson in #23724
Overhaul TF serving signatures + dummy inputs by @Rocketknight1 in #23234
[Whisper] Reduce batch size in tests by @sanchit-gandhi in #23736
Fix the regex in get_imports
to support multiline try blocks and excepts with specific exception types by @dakinggg in #23725
Remove the last few TF serving sigs by @Rocketknight1 in #23738
Fix pip install --upgrade accelerate
command in modeling_utils.py by @tloen in #23747
Fix psuh_to_hub in Trainer when nothing needs pushing by @sgugger in #23751
Revamp test selection for the example tests by @sgugger in #23737
[LongFormer] code nits, removed unused parameters by @ArthurZucker in #23749
Fix is_ninja_available() by @niltok in #23752
[Nllb-Moe
] Fix nllb moe accelerate issue by @younesbelkada in #23758
[OPT] Doc nit, using fast is fine by @ArthurZucker in #23789
Fix RWKV backward on GPU by @sgugger in #23774
Update trainer.mdx class_weights example by @amitportnoy in #23787
no_cuda does not take effect in non distributed environment by @sywangyi in #23795
Fix no such file or directory error by @RissyRan in #23783
Enable code-specific revision for code on the Hub by @sgugger in #23799
add type hint in pipeline model argument by @y3sar in #23740
TF SAM shape flexibility fixes by @Rocketknight1 in #23842
fix Whisper tests on GPU by @hollance in #23753
π [i18n-KO] Translated fast_tokenizers.mdx
to Korean by @KIHOON71 in #22956
[i18n-KO] Translated video_classification.mdx to Korean by @KIHOON71 in #23026
π [i18n-KO] Translated troubleshooting.mdx
to Korean by @0525hhgus in #23166
Adds a FlyteCallback by @peridotml in #23759
Update collating_graphormer.py by @clefourrier in #23862
[LlamaTokenizerFast] nit update post_processor
on the fly by @ArthurZucker in #23855
#23388 Issue: Update RoBERTa configuration by @vijethmoudgalya in #23863
[from_pretrained] imporve the error message when _no_split_modules
is not defined by @ArthurZucker in #23861
Editing issue with pickle def with lambda function by @Natyren in #23869
Adds AutoProcessor.from_pretrained support for MCTCTProcessor by @Ubadub in #23856
π [i18n-KO] Translated pad_truncation.mdx
to Korean by @sim-so in #23823
Fix bug leading to missing token in GPTSanJapaneseTokenizer by @passaglia in #23883
Fix last instances of kbit -> quantized by @sgugger in #23797
fix(configuration_llama): add keys_to_ignore_at_inference
to LlamaConfig
by @calico-1226 in #23891
Fix Trainer when model is loaded on a different GPU by @sgugger in #23792
Support shared tensors by @thomasw21 in #23871
ensure banned_mask and indices in same device by @cauyxy in #23901
Unpin numba by @sanchit-gandhi in #23162
[bnb
] add warning when no linear by @younesbelkada in #23894
fix: Replace add_prefix_space
in get_prompt_ids
with manual space for FastTokenizer compatibility by @connor-henderson in #23796
[RWKV
] Fix RWKV 4bit by @younesbelkada in #23910
add conditional statement for auxiliary loss calculation by @harisankar95 in #23899
Raise error if loss can't be calculated - ViT MIM by @amyeroberts in #23872
Empty circleci config by @sgugger in #23913
Bug fix - flip_channel_order for channels first images by @amyeroberts in #23701
Re-enable squad test by @sgugger in #23912
Update the update metadata job to use upload_folder by @sgugger in #23917
[PushToHub] Make it possible to upload folders by @NielsRogge in #23920
Skip device placement for past key values in decoder models by @sgugger in #23919
[Flax Whisper] Update decode docstring by @sanchit-gandhi in #23908
Effectively allow encoder_outputs
input to be a tuple in pix2struct by @fxmarty in #23932
Fix doc string nits by @sheonhan in #23929
Pin rhoknp by @sgugger in #23937
rename DocumentQuestionAnsweringTool parameter input to match docstring by @Adam-D-Lewis in #23939
Update stale.yml to use HuggingFaceBot by @LysandreJik in #23941
Make TF ESM inv_freq non-trainable like PyTorch by @Rocketknight1 in #23940
Revert "Update stale.yml to use HuggingFaceBot" by @LysandreJik in #23943
#23675 Registering Malay language by @soongbren in #23689
Modify device_map behavior when loading a model using from_pretrained by @SunMarc in #23922
use _make_causal_mask in clip/vit models by @kashif in #23942
Fix ReduceLROnPlateau
object has no attribute 'get_last_lr' by @wasupandceacar in #23944
[MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 by @patrickvonplaten in #23813
add new mms functions to doc by @patrickvonplaten in #23954
π [i18n-KO] Translated object_detection.mdx to Korean by @KIHOON71 in #23164
Trainer: fixed evaluate raising KeyError
for ReduceLROnPlateau by @claudius-kienle in #23952
[Whisper Tokenizer] Skip special tokens when decoding with timestamps by @sanchit-gandhi in #23945
Add an option to reduce compile() console spam by @Rocketknight1 in #23938
Added time-series blogs to the models by @elisim in #23857
Fix typo in doc comment of BitsAndBytesConfig by @ledyba in #23978
Skip test_multi_gpu_data_parallel_forward
for MobileViTV2ModelTest
by @ydshieh in #24017
Update README.md by @ydshieh in #24022
Auto tokenizer registration by @Bearnardd in #23965
expose safe_serialization argument in the pipeline API by @yessenzhar in #23775
Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder by @affjljoo3581 in #23976
TensorBoard callback no longer adds hparams by @bri25yu in #23999
π [i18n-KO] Translated tasks_explained.mdx
to Korean by @0525hhgus in #23844
Fix MobileViTV2
checkpoint name by @ydshieh in #24018
Pin deepspeed
to 0.9.2
for now by @ydshieh in #24024
π [i18n-KO] Translated language-modeling.mdx
by @wonhyeongseo in #23969
π [i18n-KO] Translated bertology.mdx
to Korean by @wonhyeongseo in #23968
Add check for tied parameters by @SunMarc in #24029
Fixing single candidate_label return. by @Narsil in #24023
Use TruncatedNormal from Keras initializers by @hvaara in #24036
Prevent ZeroDivisionError on trainer.evaluate
if model and dataset are tiny by @tomaarsen in #24049
Modification of one text example file should trigger said test by @sgugger in #24051
Tiny fix for check_self_hosted_runner.py
by @ydshieh in #24052
Reduce memory usage in TF building by @Rocketknight1 in #24046
Move TF building to an actual build() method by @Rocketknight1 in #23760
Use new parametrization based weight norm if available by @ezyang in #24030
bring back filtered_test_list_cross_tests.txt
by @ydshieh in #24055
Fix device placement for model-parallelism in generate for encoder/de⦠by @sgugger in #24025
Remote code improvements by @sgugger in #23959
Generate: increase left-padding test atol by @gante in #23448
[Wav2Vec2] Fix torch srcipt by @patrickvonplaten in #24062
Add support for non-rust implemented tokenization for __getitem__
method. by @jacklanda in #24039
Support PEFT models when saving the model using trainer by @younesbelkada in #24073
[Hub
] Add safe_serialization
in push_to_hub by @younesbelkada in #24074
Fix is_optimum_neuron_available
by @michaelbenayoun in #23961
[bnb
] Fix bnb skip modules by @younesbelkada in #24043
Be nice to TF by @ydshieh in #24076
Make the TF dummies even smaller by @Rocketknight1 in #24071
[doc build] Use secrets by @mishig25 in #24079
Fix expected value in tests of the test fetcher by @sgugger in #24077
Update delete_doc_comment_trigger.yml by @mishig25 in #24084
Do not prepare lr scheduler as it as the right number of steps by @sgugger in #24088
Fix a tiny typo in WhisperForConditionalGeneration::generate
docstring by @sadra-barikbin in #24045
[Trainer
] Correct behavior of _load_best_model
for PEFT models by @younesbelkada in #24103
The following contributors have made significant changes to the library over the last release:
fast_tokenizers.mdx
to Korean (#22956)Fixes the package so non-Python files (like CUDA kernels) are properly included.