Evaluation of Multimodal Large Language Models (MLLMs)
This repository presents a detailed survey on the benchmarks of Multimodal Large Language Models (MLLMs), focusing on their performance across various applications such as visual question answering, visual perception, understanding, and reasoning. The survey reviews over 200 benchmarks and evaluations, categorized into key areas:
Key Features:
- Comprehensive Evaluation: In-depth analysis of MLLMs from multiple perspectives including perception, cognition, and reasoning.
- Diverse Applications: Covers applications in specific domains such as healthcare, autonomous driving, and more.
- Future Directions: Discusses limitations of current evaluation methods and explores promising future research directions.
Benefits:
- Research Collaboration: Encourages collaboration on academic research and writing papers.
- Active Maintenance: The repository will be regularly updated with new research findings.
Highlights:
- Focus on key capabilities like conversation abilities, hallucination, and trustworthiness.
- Exploration of various modalities including videos, audio, and 3D points.
For more information, visit the GitHub repository.