Fastdiffusion Save

Notes and plans for fastdiffusion course

Project README

fastdiffusion

Big resource list: What's the score? Review of latest Score Based Generative Modeling papers.
List of diffusion papers: Diffusion Reading Group
labml.ai Annotated PyTorch Paper Implementations

Useful resources

Stable Diffusion with 🧨 Diffusers
Huggingface noteboooks
Simple diffusion from Johno
Introduction to Diffusion Models for Machine Learning - AssemblyAI
Tutorial - What is a variational autoencoder?
"Grokking Stable Diffusion" from Johno
Grokking SD Part 2: Textual Inversion
What are Diffusion Models? · Lilian Weng
Generative Modeling by Estimating Gradients of the Data Distribution (Yang Song)
The Annotated Diffusion Model
Understanding VQ-VAE (DALL-E Explained Pt. 1)
Diffusers Interpret. Model explainability, could be adapted to show some nice instructive plots.
Denoising Diffusion Probabilistic Model in Flax by YiYi Xu, includes P2 weighting, self-conditioning, and EMA
A Traveler’s Guide to the Latent Space
Denoising diffusion probabilistic models - math+code tutorials in 4 notebooks
Two articles from Sander Dieleman
- Diffusion models are autoencoders
- Guidance: a cheat code for diffusion models

Additional papers

Diffusion Models Beat GANs on Image Synthesis, Dhariwal & Nichol 2021.

Proposes architecture improvements (as of the state of the art in 2021, i.e. DDPM and DDIM) that could give some insight when we write models from scratch. In addition, introduces classifier guidance to improve conditional image synthesis. This was later replaced by classifier-free guidance, but using a classifier looks like the natural thing to do for conditional generation.
Fast Sampling of Diffusion Models with Exponential Integrator.

DEIS Scheduler. Authors claim excellent sampling results with as few as 12 steps. I haven't read it yet.

Application-oriented papers

Some of these tricks could be effective / didactic.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.

"Text Inversion": create new text embeddings from a few sample images. This effectively introduces new terms in the vocabulary that can be used in phrases for text to image generation.
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.

Similar goal as the text inversion paper, but different approach I think (I haven't read it yet).
Prompt-to-Prompt Image Editing with Cross Attention Control, Hertz et al. 2022.

Manipulate the cross-attention layers to produce changes in the text-to-image generation by replacing words, introducing new terms or weighting the importance of existing terms.
VToonify: Controllable High-Resolution Portrait Video Style Transfer

High-quality and temporally-coherent artistic portrait videos with flexible style controls.

Improvements on simple diffusion

Better denoising autoencoder (diffusion model)
- Unet
- Attention
- P2 weighting
- EMA
- Self-conditioning
Predict noise / gradient (Score based diffusion)
Latent diffusion (can not be a unet)
- Attention
Better loss functions
- Perceptual + MSE + GAN (in the VAE)
Preconditioning/scaling inputs and outputs
Other crappifiers
Data augmentation
Better samplers / optimisers
Initialisers such as pixelshuffle
Learnable blur
Blur noise

Applications

Style transfer
Super-res
Colorisation
Remove jpeg noise
Remove watermarks
Deblur
CycleGAN / Pixel2Pixel -> change subject/location/weather/etc

Diffusion Applications and Demos

Stable Diffusion fine-tuning (for specific styles or domains).
- Pokemon fine-tuning.
- Japanese Stable Diffusion code demo. They had to fine-tune the text embeddings too because the tokenizer was different.
Stable Diffusion morphing / videos. Code by @nateraw based on a gist by @karpathy.
Image Variations. Demo, with links to code. Use the CLIP image embeddings as conditioning for the generation, instead of the text embeddings. This requires fine-tuning of the model because, as far as I understand it, the text and image embeddings are not aligned in the embedding space. CLOOB doesn't have this limitation, but I heard (source: Boris Dayma from a conversation with Katherine Crowson) that attempting to train a diffusion model with CLOOB conditioning instead of CLIP produced less variety of results.
Image to image generation. Demo sketch -> image.

Style Transfer

Vincent's work: https://github.com/VinceMarron/style_transfer/blob/master/vgg_styletrans.py
Johno's implementation of that plus some different style loss variants: https://colab.research.google.com/drive/1nTcswqeDmiW67WjEaQ8lAZP9v_5gKjCB?usp=sharing
Insporation for the Sliced OT version: https://www.youtube.com/watch?v=ZFYZFlY7lgI&t=10s (Aside: NCA are super cool, I want to research them more as soon as the course craziness subsides)
ImStack (which I like over just optimizing raw pixels): https://johnowhitaker.github.io/imstack/
Q: fast style transfer (where a network does one-shot stylization) what networks and tricks seem to work best?
Q: Do augmentations help with Getys style style transfer? TODO Johno test
Q: What layers give good results? Would a different network to VGG16 be better?

Other model ideas

Latent space models
- Imagenet
- CLIP
- Noisy clip

Open Source Agenda is not affiliated with "Fastdiffusion" Project. README Source: fastai/fastdiffusion

Stars

196

Open Issues

Last Commit

1 year ago

Repository

fastai/fastdiffusion

License

Apache-2.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/fastdiffusion"><img src="https://www.opensourceagenda.com/projects/fastdiffusion/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022