Recent LLM-based CV and related works. Welcome to comment/contribute!
Recent LLM (Large Language Models)-based CV and multi-modal works. Welcome to comment/contribute!
(arXiv 2023.3) CAN LARGE LANGUAGE MODELS DESIGN A ROBOT? [Paper]
(arXiv 2023.3) Learning video embedding space with Natural Language Supervision, [Paper]
(arXiv 2023.3) Audio Visual Language Maps for Robot Navigation, [Paper], [Project]
(arXiv 2023.3) ViperGPT: Visual Inference via Python Execution for Reasoning, [Paper]
(arXiv 2023.3) ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions, [Paper], [Code]
(arXiv 2023.3) Can an Embodied Agent Find Your “Cat-shaped Mug”? LLM-Based Zero-Shot Object Navigation, [Paper], [Project]
(arXiv 2023.3) Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, [Paper], [Code]
(arXiv 2023.3) PaLM-E: An Embodied Multimodal Language Model, [Paper], [Project]
(arXiv 2023.3) Language Is Not All You Need: Aligning Perception with Language Models, [Paper], [Code]