AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal...
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Per...
Vision utilities for web interaction agents 👀
Lightweight GPT-4 Vision processing over the Webcam
Draw your projects to life
Convert different model APIs into the OpenAI API format out of the box.
Implementation of MambaByte in "MambaByte: Token-free Selective State Sp...
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GP...
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and ...
Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal ...