Fastdiffusion Save

Notes and plans for fastdiffusion course

Project README


Useful resources

Additional papers

  • Diffusion Models Beat GANs on Image Synthesis, Dhariwal & Nichol 2021.

    Proposes architecture improvements (as of the state of the art in 2021, i.e. DDPM and DDIM) that could give some insight when we write models from scratch. In addition, introduces classifier guidance to improve conditional image synthesis. This was later replaced by classifier-free guidance, but using a classifier looks like the natural thing to do for conditional generation.

  • Fast Sampling of Diffusion Models with Exponential Integrator.

    DEIS Scheduler. Authors claim excellent sampling results with as few as 12 steps. I haven't read it yet.

Application-oriented papers

Some of these tricks could be effective / didactic.

Improvements on simple diffusion

  • Better denoising autoencoder (diffusion model)
    • Unet
    • Attention
    • P2 weighting
    • EMA
    • Self-conditioning
  • Predict noise / gradient (Score based diffusion)
  • Latent diffusion (can not be a unet)
    • Attention
  • Better loss functions
    • Perceptual + MSE + GAN (in the VAE)
  • Preconditioning/scaling inputs and outputs
  • Other crappifiers
  • Data augmentation
  • Better samplers / optimisers
  • Initialisers such as pixelshuffle
  • Learnable blur
  • Blur noise


  • Style transfer
  • Super-res
  • Colorisation
  • Remove jpeg noise
  • Remove watermarks
  • Deblur
  • CycleGAN / Pixel2Pixel -> change subject/location/weather/etc

Diffusion Applications and Demos

  • Stable Diffusion fine-tuning (for specific styles or domains).

    • Pokemon fine-tuning.

    • Japanese Stable Diffusion code demo. They had to fine-tune the text embeddings too because the tokenizer was different.

  • Stable Diffusion morphing / videos. Code by @nateraw based on a gist by @karpathy.

  • Image Variations. Demo, with links to code. Use the CLIP image embeddings as conditioning for the generation, instead of the text embeddings. This requires fine-tuning of the model because, as far as I understand it, the text and image embeddings are not aligned in the embedding space. CLOOB doesn't have this limitation, but I heard (source: Boris Dayma from a conversation with Katherine Crowson) that attempting to train a diffusion model with CLOOB conditioning instead of CLIP produced less variety of results.

  • Image to image generation. Demo sketch -> image.

Style Transfer

Other model ideas

  • Latent space models
    • Imagenet
    • CLIP
    • Noisy clip
Open Source Agenda is not affiliated with "Fastdiffusion" Project. README Source: fastai/fastdiffusion
Open Issues
Last Commit
2 weeks ago

Open Source Agenda Badge

Open Source Agenda Rating