Min Dalle Versions Save

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch

0.4

1 year ago
  • Fixed a criticial CUDA runtime error that occurred when generating tokens larger than the VQGAN's vocabulary
  • Added generate_images_stream and generate_images to generate individual images. Is in active use in discord bot.
  • Faster inference, can generate a 9x9 grid in 38 seconds on an A100
  • Added temperature, top_k, and supercondition_factor parameters
  • Added a simple TKinter UI (thanks to @20kdc)
  • Added an option to tiles images in token space instead of pixel space. This creates a seamless effect where the borders between images are blended.

0.3

1 year ago
  • added is_reusable parameter. Turning it off saves memory (e.g. for command line script) and keeping it on makes multiple calls to generate_image faster
  • added log2_k parameter to control top-k image token sampling
  • added log2_supercondition_factor parameter to control the super conditioning amount
  • added log2_mid_count and generate_image_stream to stream intermediate outputs. Incomplete tokens are detokenized to an image multiple times during the decoding process. This adds very little time to the overall run time
  • added dtype parameter to autocast operations to float32, float16, or bfloat16
  • a grid size of 8x8 now generates in 35 seconds on an A100

0.2.0

1 year ago
  • Added to PyPI so now the entire setup process is pip install min-dalle
  • Pre-converted PyTorch weights are downloaded when needed from a Hugging Face hub, no more converting from flax

Breaking Changes

  • MinDalleTorch is now MinDalle
  • MinDalleFlax and flax-to-torch conversion code have been moved to a different repository

0.1.1

1 year ago

Important Bug Fixes

  • Image tokens were mistakenly being computed twice in command line script when using torch
  • Tokenizer was not working correctly on some machines previously (e.g. windows). Files are now read with ut8-encoding.

New Features

  • is_expendable argument reduces memory usage for command line script by loading then unloading encoder/decoder/detokenizer when needed
  • simpler 4D attention_state replacing 5D keys_values_state and faster inference time

0.1

1 year ago

MinDalleTorch and MinDalleFlax classes to initialize model once and run multiple times