A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
A curated list of research papers in 3D visual grounding. (Contact: jhj20 at mails.tsinghua.edu.cn)
[2022/04/15]: Create this repository.
[2022/05/25]: Expend the scope to 3D-Vision-and-Language, e.g., 3D Visual Grounding, 3D Dense Caption and 3D Question Answering.
Yang, Zhengyuan, et al. SAT: 2D Semantics Assisted Training for 3D Visual Grounding. ICCV 2021, Oral. [Paper] [Code]
Personal Notes:
Yuan, Zhihao, et al. InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring . ICCV 2021. [Paper] [Code]
Zhao, Lichen, et al. 3DVG-Transformer: Relation modeling for visual grounding on point clouds. ICCV 2021. [Paper] [Code]
Personal Notes:
Huang, Shijia, et al. Multi-View Transformer for 3D Visual Grounding. CVPR 2022. [Paper] [Code]
Personal Notes:
Luo, Junyu, et al. 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection. CVPR 2022, Oral. [Paper] [Code]
Personal Notes:
Cai, Daigang, et al. 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR 2022.
ReferIt3D(Nr3D, Sr3D/Sr3D+): Achlioptas, Panos, et al. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes. ECCV 2020, Oral. [Paper] [Code] [Website] [Leaderboard]
Dataset Statistics:
ScanRefer: Chen, Dave Zhenyu, et al. ScanRefer 3D Object Localization in RGB-D Scans Using Natural Language. ECCV 2020. [Paper] [Code] [Website] [Leaderboard]
Dataset Statistics:
ScanQA: Azuma, Daichi, et al. ScanQA: 3D Question Answering for Spatial Scene Understanding. CVPR 2022. [Paper] [Data Preparation]
SQA3D: Ma, Xiaojian and Yong, Silong, et al. SQA3D: Situated Question Answering in 3D Scenes. ICLR 2023. [Paper] [Data & Code]
Pending...