Embed arbitrary modalities (images, audio, documents, etc) into large language models.
No resources for this project.