Text To Video Finetuning Versions Save

Finetune ModelScope's Text To Video model using Diffusers 🧨

v3.01

5 months ago

First of all a note from me. Thank you guys for your support, feedback, and journey through discovering the nascent, innate potential of video Diffusion Models.

@damo-vilab (the creators of ModelScope and others) Has released an official repository for finetuning all things Video Diffusion Models, and I recommend their implementations over this repository. https://github.com/damo-vilab/i2vgen-xl

https://github.com/ExponentialML/Text-To-Video-Finetuning/assets/59846140/55608f6a-333a-458f-b7d5-94461c5da8bb

This repository will no longer be updated, but will instead be archived for researchers & builders that wish to bootstrap their projects. I will be leaving the issues, pull requests, and all related things for posterity purposes.

Thanks again!

v3.00

10 months ago

New Release with some exciting features and bug fixes!

Changes

  • Add alternative to offset noise from https://arxiv.org/abs/2305.08891 rescale_schedule in the config.

  • Use default dropout of 0.1 on all temporal convolution layers.

  • Added support for training LoRA models for use with the text2video A1111 extension. lora_version: "stable_lora" in the config.

  • Add ability to choose different Accelerator loggers.

  • Regress Accelerator version to 0.19 to prevent model checkpoint saving issues.

  • Multiple contributions to inference.py for stability and ease of use. Thanks @bruefire, @JCBrouwer, and @bfasenfest!

v2.2.0

1 year ago

What's New

  • LoRa training based off of cloneofsimo's repository.
  • Add LoraInjectedConv3d module. :movie_camera:
  • Add config for LoRA only training.
  • Add option to save LoRA for UNet & Text Encoder.
  • Fix checkpointing model files during training.

v2.0.0

1 year ago

Changes and Updates

  • High quality VRAM config.
  • Add text encoder training.
  • Allow training on lowwe vram systems.
  • Allow single image training.
  • Train with image captions.
  • Train with video captions in folder.
  • Gradient checkpointing support.
  • Time agnostic training.
  • Add aspect ratio bucketing.
  • Add hybrid LoRA for training.
  • Add latent VAE caching.
  • Add optimizer agnostic settings in config.
  • Soup up unet finetuner for readability and efficiency.