Finetune ModelScope's Text To Video model using Diffusers 🧨
First of all a note from me. Thank you guys for your support, feedback, and journey through discovering the nascent, innate potential of video Diffusion Models.
@damo-vilab (the creators of ModelScope and others) Has released an official repository for finetuning all things Video Diffusion Models, and I recommend their implementations over this repository. https://github.com/damo-vilab/i2vgen-xl
This repository will no longer be updated, but will instead be archived for researchers & builders that wish to bootstrap their projects. I will be leaving the issues, pull requests, and all related things for posterity purposes.
Thanks again!
Add alternative to offset noise from https://arxiv.org/abs/2305.08891 rescale_schedule
in the config.
Use default dropout of 0.1 on all temporal convolution layers.
Added support for training LoRA models for use with the text2video A1111 extension.
lora_version: "stable_lora"
in the config.
Add ability to choose different Accelerator loggers.
Regress Accelerator version to 0.19 to prevent model checkpoint saving issues.
Multiple contributions to inference.py
for stability and ease of use. Thanks @bruefire, @JCBrouwer, and @bfasenfest!
LoraInjectedConv3d
module. :movie_camera: