Awesome 3D Vision And Language Save

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

Project README

Awesome-3D-Vision-and-Language

A curated list of research papers in 3D visual grounding. (Contact: jhj20 at mails.tsinghua.edu.cn)

💬 News

[2022/04/15]: Create this repository.
[2022/05/25]: Expend the scope to 3D-Vision-and-Language, e.g., 3D Visual Grounding, 3D Dense Caption and 3D Question Answering.

3D Visual Grounding
3D Question Answering
- Paper Roadmap (Chronological Order)
- Datasets
3D Dense Caption

3D Visual Grounding

3D VG Paper Roadmap (Chronological Order)

ECCV 2020

Achlioptas, Panos, et al. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes. ECCV 2020, Oral. [Paper] [Code] [Website]
Chen, Dave Zhenyu, et al. ScanRefer 3D Object Localization in RGB-D Scans Using Natural Language. ECCV 2020. [Paper] [Code] [Website]

AAAI 2021

Huang, Pin-Hao, et al. Text-guided graph neural networks for referring 3d instance segmentation. AAAI 2021. [Paper] [Code]

CVPR 2021

Feng, Mingtao, et al. Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud. CVPR 2021. [Paper] [Code]
Liu, Haolin, et al. Refer-It-in-RGBD: A Bottom-Up Approach for 3D Visual Grounding in RGBD Images. CVPR 2021. [Paper] [Code] [Website]

ICCV 2021

Yang, Zhengyuan, et al. SAT: 2D Semantics Assisted Training for 3D Visual Grounding. ICCV 2021, Oral. [Paper] [Code]

Personal Notes:
- Use corresponding 2D image data(ROI feature, label, bbox coordinates and camera pose) to assist 3D grounding.
- Very solid experiments.
Yuan, Zhihao, et al. InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring . ICCV 2021. [Paper] [Code]
Zhao, Lichen, et al. 3DVG-Transformer: Relation modeling for visual grounding on point clouds. ICCV 2021. [Paper] [Code]

Personal Notes:
- The novelty of this paper comes from the coordinate-guied contextual aggregation module.

ACM-MM 2021

He, Dailan, et al. TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding. ACM-MM 2021. [Paper]

CVPR 2022

Huang, Shijia, et al. Multi-View Transformer for 3D Visual Grounding. CVPR 2022. [Paper] [Code]

Personal Notes:
- Rotating the center xyz of objects to provide view-related positional information before going through a Tranformer decoder.
- SOTA results on Nr3D and Sr3D, good reuslts on ScanRefer.
Luo, Junyu, et al. 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection. CVPR 2022, Oral. [Paper] [Code]

Personal Notes:
- First single stage work in 3D Visual Grounding !!!
- The general idea is similar to the iterative shrinking work in 2D Visual Grounding, but the design is more elegant.
Cai, Daigang, et al. 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR 2022.

3D VG Datasets

ReferIt3D(Nr3D, Sr3D/Sr3D+): Achlioptas, Panos, et al. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes. ECCV 2020, Oral. [Paper] [Code] [Website] [Leaderboard]

Dataset Statistics:
- Natural Reference in 3D (Nr3D)
- Spatial Reference in 3D (Sr3D)
ScanRefer: Chen, Dave Zhenyu, et al. ScanRefer 3D Object Localization in RGB-D Scans Using Natural Language. ECCV 2020. [Paper] [Code] [Website] [Leaderboard]

Dataset Statistics:
- On average, there are 13.81 objects, 64.48 descriptions per scene, and 4.67 descriptions per object.
- Average length of descriptions is 20.27. Frequency of object attributes: spatial language (98.7%), color (74.7%), shape terms (64.9%), and size information (14.2%).

3D VG Workshops

CVPR 2021 1st Workshop on Language for 3D Scenes. [Website]

3D Question Answering

3D QA Paper Roadmap (Chronological Order)

CVPR 2022

Azuma, Daichi, et al. ScanQA: 3D Question Answering for Spatial Scene Understanding. CVPR 2022. [Paper] [Code]

ICLR 2023

Ma, Xiaojian and Yong, Silong, et al. SQA3D: Situated Question Answering in 3D Scenes. ICLR 2023. [Paper] [Data & Code]

3D QA Datasets

ScanQA: Azuma, Daichi, et al. ScanQA: 3D Question Answering for Spatial Scene Understanding. CVPR 2022. [Paper] [Data Preparation]
SQA3D: Ma, Xiaojian and Yong, Silong, et al. SQA3D: Situated Question Answering in 3D Scenes. ICLR 2023. [Paper] [Data & Code]

3D Dense Caption

Pending...

Open Source Agenda is not affiliated with "Awesome 3D Vision And Language" Project. README Source: jianghaojun/Awesome-3D-Vision-and-Language

Stars

Open Issues

Last Commit

1 year ago

Repository

jianghaojun/Awesome-3D-Vision-and-Language

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/awesome-3d-vision-and-language"><img src="https://www.opensourceagenda.com/projects/awesome-3d-vision-and-language/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog