Lumiere Pytorch Save

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

Project README

Lumiere - Pytorch

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

Yannic's paper review

Since this paper is mostly just a few key ideas on top of text-to-image model, will take it a step further and extend the new Karras U-net to video within this repository.

Appreciation

Install

$ pip install lumiere-pytorch

Usage

import torch
from lumiere_pytorch import MPLumiere

from denoising_diffusion_pytorch import KarrasUnet

karras_unet = KarrasUnet(
    image_size = 256,
    dim = 8,
    channels = 3,
    dim_max = 768,
)

lumiere = MPLumiere(
    karras_unet,
    image_size = 256,
    unet_time_kwarg = 'time',
    conv_module_names = [
        'downs.1',
        'ups.1',
        'downs.2',
        'ups.2',
    ],
    attn_module_names = [
        'mids.0'
    ],
    upsample_module_names = [
        'ups.2',
        'ups.1',
    ],
    downsample_module_names = [
        'downs.1',
        'downs.2'
    ]
)

noised_video = torch.randn(2, 3, 8, 256, 256)
time = torch.ones(2,)

denoised_video = lumiere(noised_video, time = time)

assert noised_video.shape == denoised_video.shape

Todo

  • add all temporal layers

    • researcher must pass in all layers for
      • conv inflation modules (stages)
      • attn inflation modules (middle)
      • temporal downsample
      • temporal upsamples
    • validate time dimension is 2 ** downsample layers
    • validate number of downsamples == upsamples
    • at init, do a dry run with a mock tensor and assert output is the same
  • expose only temporal parameters for learning, freeze everything else

  • figure out the best way to deal with the time conditioning after temporal downsampling - instead of pytree transform at the beginning, probably will need to hook into all the modules and inspect the batch sizes

  • handle middle modules that may have output shape as (batch, seq, dim)

  • following the conclusions of Tero Karras, improvise a variant of the 4 modules with magnitude preservation

  • test out on imagen-pytorch

  • look into multi-diffusion and see if it can turned into some simple wrapper

Citations

@inproceedings{BarTal2024LumiereAS,
    title   = {Lumiere: A Space-Time Diffusion Model for Video Generation},
    author  = {Omer Bar-Tal and Hila Chefer and Omer Tov and Charles Herrmann and Roni Paiss and Shiran Zada and Ariel Ephrat and Junhwa Hur and Yuanzhen Li and Tomer Michaeli and Oliver Wang and Deqing Sun and Tali Dekel and Inbar Mosseri},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:267095113}
}
@article{Karras2023AnalyzingAI,
    title   = {Analyzing and Improving the Training Dynamics of Diffusion Models},
    author  = {Tero Karras and Miika Aittala and Jaakko Lehtinen and Janne Hellsten and Timo Aila and Samuli Laine},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2312.02696},
    url     = {https://api.semanticscholar.org/CorpusID:265659032}
}
Open Source Agenda is not affiliated with "Lumiere Pytorch" Project. README Source: lucidrains/lumiere-pytorch
Stars
210
Open Issues
4
Last Commit
1 week ago
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating