LLM In Vision Save Abandoned

Recent LLM-based CV and related works. Welcome to comment/contribute!

Project README

LLM-in_Vision

Recent LLM (Large Language Models)-based CV and multi-modal works. Welcome to comment/contribute!

2023.3

  • (arXiv 2023.3) CAN LARGE LANGUAGE MODELS DESIGN A ROBOT? [Paper]

  • (arXiv 2023.3) Learning video embedding space with Natural Language Supervision, [Paper]

  • (arXiv 2023.3) Audio Visual Language Maps for Robot Navigation, [Paper], [Project]

  • (arXiv 2023.3) ViperGPT: Visual Inference via Python Execution for Reasoning, [Paper]

  • (arXiv 2023.3) ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions, [Paper], [Code]

  • (arXiv 2023.3) Can an Embodied Agent Find Your “Cat-shaped Mug”? LLM-Based Zero-Shot Object Navigation, [Paper], [Project]

  • (arXiv 2023.3) Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, [Paper], [Code]

  • (arXiv 2023.3) PaLM-E: An Embodied Multimodal Language Model, [Paper], [Project]

  • (arXiv 2023.3) Language Is Not All You Need: Aligning Perception with Language Models, [Paper], [Code]

2022.7

  • (arXiv 2022.7) Language Models are General-Purpose Interfaces, [Paper], [Code]

  • (arXiv 2022.7) LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, [Paper], [Project]

Open Source Agenda is not affiliated with "LLM In Vision" Project. README Source: DirtyHarryLYL/LLM-in-Vision
Stars
27
Open Issues
0
Last Commit
5 months ago

Open Source Agenda Badge

Open Source Agenda Rating