Anything2Image Save Abandoned

Generate image from anything with ImageBind and Stable Diffusion

Project README

Anything To Image

PyPI

Generate image from anything with ImageBind's unified latent space and stable-diffusion-2-1-unclip.

We need at least 22 Gb GPU memory for the demo. Therefore gradio and colab online demo might need pro account to obtain more GPU/memory to run them.

Support Tasks

Update

[2023/5/19]:

  • Anything2Image has been integrated into InternGPT.
  • [v1.1.4]: Support fusing audio and text in ImageBind latent space and UI improvements.

[2023/5/18]

  • [v1.1.3]: Support thermal to image.
  • [v1.1.0]: Gradio GUI - add options for controling image size, and noise scheduler.
  • [v1.0.8]: Gradio GUI - add options for controling noise level, audio-image embedding arithmetic strength, and number of inference steps.

https://github.com/Zeqiang-Lai/Anything2Image/assets/26198430/eac4a947-c6b1-4553-91c3-4aec5625908b

Getting Started

Requirements

Ensure you have PyTorch installed.

  • Python >= 3.8
  • PyTorch >= 1.13

Then install the anything2image.

# from pypi
pip install anything2image
# or locally install via git clone
git clone [email protected]:Zeqiang-Lai/Anything2Image.git
cd Anything2Image
pip install .

Usage

# lanuch gradio demo
python -m anything2image.app
# command line demo, see also the tasks examples below.
python -m anything2image.cli --audio assets/wav/cat.wav

Audio to Image

bird_audio.wav dog_audio.wav cattle.wav cat.wav
fire_engine.wav train.wav motorcycle.wav plane.wav
python -m anything2image.cli --audio assets/wav/cat.wav

See also audio2img.py.

import anything2image.imagebind as ib
import torch
from diffusers import StableUnCLIPImg2ImgPipeline

# construct models
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16
).to(device)
model = ib.imagebind_huge(pretrained=True).eval().to(device)

# generate image
with torch.no_grad():
    audio_paths=["assets/wav/bird_audio.wav"]
    embeddings = model.forward({
        ib.ModalityType.AUDIO: ib.load_and_transform_audio_data(audio_paths, device),
    })
    embeddings = embeddings[ib.ModalityType.AUDIO]
    images = pipe(image_embeds=embeddings.half()).images
    images[0].save("audio2img.png")

Audio+Text to Image

cat.wav cat.wav bird_audio.wav bird_audio.wav
A painting A photo A painting A photo
python -m anything2image.cli --audio assets/wav/cat.wav --prompt "a painting"

See also audiotext2img.py.

with torch.no_grad():
    audio_paths=["assets/wav/bird_audio.wav"]
    embeddings = model.forward({
        ib.ModalityType.AUDIO: ib.load_and_transform_audio_data(audio_paths, device),
    })
    embeddings = embeddings[ib.ModalityType.AUDIO]
    images = pipe(prompt='a painting', image_embeds=embeddings.half()).images
    images[0].save("audiotext2img.png")

Audio+Image to Image

Audio & Image Output Audio & Image Output
wave.wav wave.wav
python -m anything2image.cli --audio assets/wav/wave.wav --image "assets/image/bird.png"
with torch.no_grad():
    embeddings = model.forward({
        ib.ModalityType.VISION: ib.load_and_transform_vision_data(["assets/image/bird.png"], device),
    })
    img_embeddings = embeddings[ib.ModalityType.VISION]
    embeddings = model.forward({
        ib.ModalityType.AUDIO: ib.load_and_transform_audio_data(["assets/wav/wave.wav"], device),
    }, normalize=False)
    audio_embeddings = embeddings[ib.ModalityType.AUDIO]
    embeddings = (img_embeddings + audio_embeddings)/2
    images = pipe(image_embeds=embeddings.half()).images
    images[0].save("audioimg2img.png")

Image to Image

Top: Input Images. Bottom: Generated Images.

python -m anything2image.cli --image "assets/image/bird.png"

See also img2img.py.

with torch.no_grad():
    paths=["assets/image/room.png"]
    embeddings = model.forward({
        ib.ModalityType.VISION: ib.load_and_transform_vision_data(paths, device),
    }, normalize=False)
    embeddings = embeddings[ib.ModalityType.VISION]
    images = pipe(image_embeds=embeddings.half()).images
    images[0].save("img2img.png")

Text to Image

A photo of a car. A sunset over the ocean. A bird's-eye view of a cityscape. A close-up of a flower.

It is not necessary to use ImageBind for text to image. Nervertheless, we show the alignment of ImageBind's text latent space and its image spaces.

python -m anything2image.cli --text "A sunset over the ocean."

See also text2img.py.

with torch.no_grad():
    embeddings = model.forward({
        ib.ModalityType.TEXT: ib.load_and_transform_text(['A photo of a car.'], device),
    }, normalize=False)
    embeddings = embeddings[ib.ModalityType.TEXT]
    images = pipe(image_embeds=embeddings.half()).images
    images[0].save("text2img.png")

Thermal to Image

Input Output Input Output

Top: Input Images. Bottom: Generated Images.

python -m anything2image.cli --thermal "assets/thermal/030419.jpg"

See also thermal2img.py.

with torch.no_grad():
    thermal_paths =['assets/thermal/030419.jpg']
    embeddings = model.forward({
        ib.ModalityType.THERMAL: ib.load_and_transform_thermal_data(thermal_paths, device),
    }, normalize=True)
    embeddings = embeddings[ib.ModalityType.THERMAL]
    images = pipe(image_embeds=embeddings.half()).images
    images[0].save("thermal2img.png")

Citation

Latent Diffusion

@InProceedings{Rombach_2022_CVPR,
    author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
    title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {10684-10695}
}

ImageBind

@inproceedings{girdhar2023imagebind,
  title={ImageBind: One Embedding Space To Bind Them All},
  author={Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang
and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
  booktitle={CVPR},
  year={2023}
}
Open Source Agenda is not affiliated with "Anything2Image" Project. README Source: Zeqiang-Lai/Anything2Image
Stars
125
Open Issues
2
Last Commit
10 months ago

Open Source Agenda Badge

Open Source Agenda Rating