Project README

Lumiere - Pytorch

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

Since this paper is mostly just a few key ideas on top of text-to-image model, will take it a step further and extend the new Karras U-net to video within this repository.

Appreciation

A16Z Open Source AI Grant Program and 🤗 Huggingface for the generous sponsorships, as well as my other sponsors, for affording me the independence to open source current artificial intelligence research

Install

$ pip install lumiere-pytorch

Usage

import torch
from lumiere_pytorch import MPLumiere

from denoising_diffusion_pytorch import KarrasUnet

karras_unet = KarrasUnet(
    image_size = 256,
    dim = 8,
    channels = 3,
    dim_max = 768,
)

lumiere = MPLumiere(
    karras_unet,
    image_size = 256,
    unet_time_kwarg = 'time',
    conv_module_names = [
        'downs.1',
        'ups.1',
        'downs.2',
        'ups.2',
    ],
    attn_module_names = [
        'mids.0'
    ],
    upsample_module_names = [
        'ups.2',
        'ups.1',
    ],
    downsample_module_names = [
        'downs.1',
        'downs.2'
    ]
)

noised_video = torch.randn(2, 3, 8, 256, 256)
time = torch.ones(2,)

denoised_video = lumiere(noised_video, time = time)

assert noised_video.shape == denoised_video.shape

Todo

add all temporal layers
- researcher must pass in all layers for
  - conv inflation modules (stages)
  - attn inflation modules (middle)
  - temporal downsample
  - temporal upsamples
- validate time dimension is 2 ** downsample layers
- validate number of downsamples == upsamples
- at init, do a dry run with a mock tensor and assert output is the same
expose only temporal parameters for learning, freeze everything else
figure out the best way to deal with the time conditioning after temporal downsampling - instead of pytree transform at the beginning, probably will need to hook into all the modules and inspect the batch sizes
handle middle modules that may have output shape as (batch, seq, dim)
following the conclusions of Tero Karras, improvise a variant of the 4 modules with magnitude preservation
test out on imagen-pytorch
look into multi-diffusion and see if it can turned into some simple wrapper

Citations

@inproceedings{BarTal2024LumiereAS,
    title   = {Lumiere: A Space-Time Diffusion Model for Video Generation},
    author  = {Omer Bar-Tal and Hila Chefer and Omer Tov and Charles Herrmann and Roni Paiss and Shiran Zada and Ariel Ephrat and Junhwa Hur and Yuanzhen Li and Tomer Michaeli and Oliver Wang and Deqing Sun and Tali Dekel and Inbar Mosseri},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:267095113}
}

@article{Karras2023AnalyzingAI,
    title   = {Analyzing and Improving the Training Dynamics of Diffusion Models},
    author  = {Tero Karras and Miika Aittala and Jaakko Lehtinen and Janne Hellsten and Timo Aila and Samuli Laine},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2312.02696},
    url     = {https://api.semanticscholar.org/CorpusID:265659032}
}

Open Source Agenda is not affiliated with "Lumiere Pytorch" Project. README Source: lucidrains/lumiere-pytorch

Stars

210

Open Issues

Last Commit

1 week ago

Repository

lucidrains/lumiere-pytorch

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/lumiere-pytorch"><img src="https://www.opensourceagenda.com/projects/lumiere-pytorch/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog