awesome grounding: A curated list of research papers in visual grounding
The Cradle framework is a first attempt at General Computer Control (GCC...
CLIPort: What and Where Pathways for Robotic Manipulation
Grounded Multimodal Large Language Model with Localized Visual Tokenization
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-H...
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
We perform functional grounding of LLMs' knowledge in BabyAI-Text
Official implementation of ICCV19 oral paper Zero-Shot grounding of Obje...
[CVPR20] Video Object Grounding using Semantic Roles in Language Descrip...
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://...
Code for CVPR'18 "Grounding Referring Expressions in Images by Variation...