Min Dalle Versions Save

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch

1 year ago

Fixed a criticial CUDA runtime error that occurred when generating tokens larger than the VQGAN's vocabulary
Added generate_images_stream and generate_images to generate individual images. Is in active use in discord bot.
Faster inference, can generate a 9x9 grid in 38 seconds on an A100
Added temperature, top_k, and supercondition_factor parameters
Added a simple TKinter UI (thanks to @20kdc)
Added an option to tiles images in token space instead of pixel space. This creates a seamless effect where the borders between images are blended.

1 year ago

added is_reusable parameter. Turning it off saves memory (e.g. for command line script) and keeping it on makes multiple calls to generate_image faster
added log2_k parameter to control top-k image token sampling
added log2_supercondition_factor parameter to control the super conditioning amount
added log2_mid_count and generate_image_stream to stream intermediate outputs. Incomplete tokens are detokenized to an image multiple times during the decoding process. This adds very little time to the overall run time
added dtype parameter to autocast operations to float32, float16, or bfloat16
a grid size of 8x8 now generates in 35 seconds on an A100

1 year ago

Added to PyPI so now the entire setup process is pip install min-dalle
Pre-converted PyTorch weights are downloaded when needed from a Hugging Face hub, no more converting from flax

Breaking Changes

MinDalleTorch is now MinDalle
MinDalleFlax and flax-to-torch conversion code have been moved to a different repository

1 year ago

Important Bug Fixes

Image tokens were mistakenly being computed twice in command line script when using torch
Tokenizer was not working correctly on some machines previously (e.g. windows). Files are now read with ut8-encoding.

New Features

is_expendable argument reduces memory usage for command line script by loading then unloading encoder/decoder/detokenizer when needed
simpler 4D attention_state replacing 5D keys_values_state and faster inference time

1 year ago

MinDalleTorch and MinDalleFlax classes to initialize model once and run multiple times