MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

🔔 News

🔥[2024-10-14]: Paper released on arXiv. Data and evaluation code will be released soon.

Introduction

We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500 real-world tasks, to address the highly heterogeneous daily use cases of end users. Our objective is to optimize for a set of high-quality data samples that cover a highly diverse and rich set of multimodal tasks, while enabling cost-effective and accurate model evaluation.

Key features of MEGA-Bench:

505 realistic tasks encompassing over 8,000 samples from 16 expert annotators
Wide range of output formats including numbers, phrases, code, LaTeX, coordinates, JSON, free-form, etc.
Over 40 metrics developed to evaluate these diverse tasks
Fine-grained capability report across multiple dimensions (e.g., application, input type, output format, skill)
Interactive visualization of model capabilities

Unlike existing benchmarks that unify problems into standard multi-choice questions, MEGA-Bench embraces the diversity of real-world tasks and their output formats. This allows for a more comprehensive evaluation of vision-language models across various dimensions.

Dataset

The MEGA-Bench dataset is now available on the Hugging Face. You can access it at:

MEGA-Bench

Evaluation

[Evaluation code and instructions will be added once released]

Contact

For any questions or concerns, please contact:

Jiacheng Chen: [email protected]
Wenhu Chen: [email protected]

Citation

If you find this work useful for your research, please consider citing our paper:

@article{chen2024mega-bench,
  title={MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks},
  author={Chen, Jiacheng and Liang, Tianhao and Siu, Sherman and Wang, Zhengqing and Wang, Kai and Wang, Yubo and Ni, Yuansheng and Zhu, Wang and Jiang, Ziyan and Lyu, Bohan and Jiang, Dongfu and He, Xuan and Liu, Yuan and Hu, Hexiang and Yue, Xiang and Chen, Wenhu},
  journal={arXiv preprint arXiv:2410.10563},
  year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
resources		resources
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

🔔 News

Introduction

Dataset

Evaluation

Contact

Citation

About

Releases

Packages

Contributors 4

License

TIGER-AI-Lab/MEGA-Bench

Folders and files

Latest commit

History

Repository files navigation

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

🔔 News

Introduction

Dataset

Evaluation

Contact

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages