LAVIS - A One-stop Library for Language-Vision Intelligence
CMU MultimodalSDK is a machine learning platform for development of adva...
500,000 multimodal short video data and baseline models. 50万条多模态短...
Compose multimodal datasets 🎹
This repository provides a comprehensive collection of research papers f...
Real-world photo sequence question answering system (MemexQA). CVPR'18 a...