LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to ...
AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs)...
This repo contains evaluation code for the paper "MMMU: A Massive Multi-...
A Framework of Small-scale Large Multimodal Models
Embed arbitrary modalities (images, audio, documents, etc) into large la...
This repo contains evaluation code for the paper "Are We on the Right Wa...
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal...
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal ...