-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987
base: main
Are you sure you want to change the base?
Changes from 9 commits
b2a5ff0
b5ed7fc
929c502
37b377f
809268e
28c44ae
44cef2f
bfda318
bb4f141
6507e4f
1aef8f8
2fdda3f
6aadd54
2656012
21fa476
fb5a9c8
f1f8604
5a3f283
f04f8ca
2974a3d
18e888f
549c968
f288278
0f1a208
d3caff6
d77e1c3
f1a72a3
6af100f
96c20ac
b5f228d
e81b5c6
0782739
62db0ce
74fa630
d334208
9cfde7d
b8b6aef
90a66f2
3229476
d3486c4
782e690
df74c0a
6dc20a5
5454271
051df69
49ecbac
12099b7
9636095
dda7b65
03cc232
74220b5
cd70bc6
6e223b2
d020514
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,85 @@ | ||||||||||||||||||||||
--- | ||||||||||||||||||||||
title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence" | ||||||||||||||||||||||
thumbnail: https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/lmms-eval-header.png | ||||||||||||||||||||||
authors: | ||||||||||||||||||||||
- user: luodian | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
pcuenca marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
- user: PY007 | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: kcz358 | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: pufanyi | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: JvThunder | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: dododododo | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: THUdyh | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: liuhaotian | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: ZhangYuanhan | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: zhangysk | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: Chunyuan24 | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
- user: liuziwei7 | ||||||||||||||||||||||
guest: true | ||||||||||||||||||||||
--- | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
# Unified multimodal large model evaluation, accelerating multimodal intelligence emergence | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should add a note about this being a guest post, like here? https://huggingface.co/blog/pollen-vision There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can make it uppercase for h1 IMO |
||||||||||||||||||||||
|
||||||||||||||||||||||
GitHub repo : https://github.com/EvolvingLMMs-Lab/lmms-eval | ||||||||||||||||||||||
|
||||||||||||||||||||||
Official website : https://lmms-lab.github.io/ | ||||||||||||||||||||||
Comment on lines
+41
to
+43
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd maybe move these links to the end of the intro (and, optionally, also to a "Resources" section at the end of the post). At this point, the reader knows nothing about what this is about so they have little incentive to click imo. |
||||||||||||||||||||||
|
||||||||||||||||||||||
With the deepening development of artificial intelligence research, multimodal large models such as GPT-4V and LLaVA have become hot topics in both academia and industry. However, these advanced models require an effective evaluation framework to accurately measure their performance, which is not an easy task. On the one hand, the diverse prompts and post-processing methods adopted by different models may lead to significant differences in performance evaluation results, as illustrated by HuggingFace's mention of "1001 flavors of MMLU" in their blog post, indicating that different implementations of the same evaluation dataset may result in significant score differences, even changing the model's ranking on leaderboards. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
|
||||||||||||||||||||||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/teaser.png" alt="Pipeline"/> | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Overview of the main features | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Maybe? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again maybe uppercase main and features |
||||||||||||||||||||||
|
||||||||||||||||||||||
**One-click evaluation**: lmms-eval allows users to easily evaluate their model performance on multiple datasets with a single command, without the need for manual dataset preparation. With just one line of code, users can obtain comprehensive evaluation results within minutes, including detailed logs and sample analysis covering model parameters, inputs and outputs, correct answers, etc. This is suitable for scenarios where advanced models like GPT4 are needed for scoring. | ||||||||||||||||||||||
|
||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
``` | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I will change the link to our current repo since hf forked repo is kind of behind and I will also add |
||||||||||||||||||||||
``` | ||||||||||||||||||||||
|
||||||||||||||||||||||
**Parallel acceleration and task merging**: Utilizing Huggingface's accelerator, lmms-eval supports multi-GPU, model parallelism, and multi-batch processing, significantly enhancing evaluation efficiency. This feature is particularly advantageous when testing multiple datasets simultaneously, greatly reducing evaluation time. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
Here is the total runtime on different datasets using 4 x A100 40G: | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This question still stands. The previous code snippet showed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is somehow unrelated because the code snippet is simply for demonstration only since we run baseline test on different datasets. |
||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
| Dataset (#num) | LLaVA-v1.5-7b | LLaVA-v1.5-13b | | ||||||||||||||||||||||
| :---------------------- | :----------------- | :----------------- | | ||||||||||||||||||||||
| mme (2374) | 2 mins 43 seconds | 3 mins 27 seconds | | ||||||||||||||||||||||
| gqa (12578) | 10 mins 43 seconds | 14 mins 23 seconds | | ||||||||||||||||||||||
| scienceqa_img (2017) | 1 mins 58 seconds | 2 mins 52 seconds | | ||||||||||||||||||||||
| ai2d (3088) | 3 mins 17 seconds | 4 mins 12 seconds | | ||||||||||||||||||||||
| coco2017_cap_val (5000) | 14 mins 13 seconds | 19 mins 58 seconds | | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
Additionally, in the 0.1.1.dev update, the team has added support for tensor parallelism, enabling the running of larger models like LLaVA-v1.6-34B on 4 x 3090 GPUs, supporting efficient inference. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
Comprehensive dataset support: The lmms-eval team has hosted over 40 diverse datasets (with the number continually increasing) on Huggingface's lmms-lab, covering a range of tasks from COCO Captions to MMMU and others. All datasets have been transformed into a unified format for archiving, available for direct access on the team's lmms-lab official Huggingface Hub. Users can view specific details of evaluation data and easily download and use them with just one click. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/org_dataset.png" alt="dataset on organization"/> | ||||||||||||||||||||||
|
||||||||||||||||||||||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/viewer.png" alt="viewer" /> | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment about the image link. |
||||||||||||||||||||||
|
||||||||||||||||||||||
**Easy to Extend**: Through a unified interface definition, lmms-eval not only simplifies the integration process of different models and datasets but also provides convenience for introducing new datasets and models. Additionally, it supports simple customization settings, allowing users to easily add new datasets through simple YAML file configuration and customize evaluation settings as needed by modifying the configuration file. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
**Comparability**: We provide an environment for authors to reproduce the scores reported in the paper for the original LLaVA 1.5 model. Furthermore, we offer complete experimental results of the LLaVA series models on all evaluation datasets, along with environmental parameters for reference (see the Readme section on GitHub). | ||||||||||||||||||||||
|
||||||||||||||||||||||
**Synchronized Online Logging**: We provide detailed logging tools to help you understand the evaluation process and results. Logs include model parameters, generation parameters, input questions, model responses, and ground truth answers. You can record every detail and visualize it in Weights & Biases runs. Users can access results in real-time from anywhere, making it convenient and efficient. | ||||||||||||||||||||||
|
||||||||||||||||||||||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/wandb_table.jpg" alt="wandb_table" /> | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think these links will be embedded correctly as images (they are references to the github tree) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog? I have uploaded all the images here but unable to find a way to let github markdown render the image |
||||||||||||||||||||||
|
||||||||||||||||||||||
## Conclusion | ||||||||||||||||||||||
|
||||||||||||||||||||||
In summary, the implementation of this framework not only provides new tools for multimodal model evaluation but also paves the way for future research and development, including video multimodal evaluation, few-shot evaluation modes, and batch inference acceleration, showcasing its powerful potential and foresight. The launch of lmms-eval marks the arrival of a new era in evaluation, opening up new paths for AI research and applications. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should live in the
blog
repo directly to render on hf.co/blog. See here for an example: https://github.com/huggingface/blog/pull/2021/files#diff-a332b83464cf2b650715bacb6e3f07b994af0790acc88a4ea353883ba2ae751eR3853Note you also need to add the blog details to
_blog.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I have also noticed that in the _blog.yml, we can only have one author on the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's just for the thumbnail, but the blog post itself will show all authors: