-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this blog post! I left a few minor suggestions and a pointer to include the details in _blog.yml
lmms_eval.md
Outdated
@@ -0,0 +1,85 @@ | |||
--- | |||
title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence" | |||
thumbnail: https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/lmms-eval-header.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should live in the blog
repo directly to render on hf.co/blog. See here for an example: https://github.com/huggingface/blog/pull/2021/files#diff-a332b83464cf2b650715bacb6e3f07b994af0790acc88a4ea353883ba2ae751eR3853
Note you also need to add the blog details to _blog.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I have also noticed that in the _blog.yml, we can only have one author on the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmms_eval.md
Outdated
**One-click evaluation**: lmms-eval allows users to easily evaluate their model performance on multiple datasets with a single command, without the need for manual dataset preparation. With just one line of code, users can obtain comprehensive evaluation results within minutes, including detailed logs and sample analysis covering model parameters, inputs and outputs, correct answers, etc. This is suitable for scenarios where advanced models like GPT4 are needed for scoring. | ||
|
||
``` | ||
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs | |
# pip install git+https://github.com/huggingface/lmms-eval.git | |
accelerate launch --multi_gpu --num_processes=8 -m lmms_eval \ | |
--model llava \ | |
--model_args pretrained="liuhaotian/llava-v1.5-7b" \ | |
--tasks mme,mmbench_en \ | |
--batch_size 1 \ | |
--log_samples \ | |
--log_samples_suffix llava_v1.5_mme_mmbenchen \ | |
--output_path ./logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will change the link to our current repo since hf forked repo is kind of behind and I will also add pip install git+https://github.com/haotian-liu/LLaVA.git
lmms_eval.md
Outdated
|
||
Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing. | ||
|
||
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. | |
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. |
Co-authored-by: lewtun <[email protected]>
Co-authored-by: lewtun <[email protected]>
Co-authored-by: lewtun <[email protected]>
Co-authored-by: lewtun <[email protected]>
Co-authored-by: lewtun <[email protected]>
Hi @lewtun , thank you for your feedback. I have uploaded the thumbnail picture and fixed several problems in the blog. Could you help us check if there are any more problems to fix in this article? When we finalize the English version of the article, we will also help to translate everything into Chinese and put it into Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting!
also cc @merveenoyan for info.
_blog.yml
Outdated
title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence" | ||
author: kcz358 | ||
thumbnail: /blog/assets/lmms_eval/thumbnail.png | ||
date: April 20, 2024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to update date before release :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also I'd move the entry to the end of the file, just in case)
lmms_eval.md
Outdated
|
||
**Synchronized Online Logging**: We provide detailed logging tools to help you understand the evaluation process and results. Logs include model parameters, generation parameters, input questions, model responses, and ground truth answers. You can record every detail and visualize it in Weights & Biases runs. Users can access results in real-time from anywhere, making it convenient and efficient. | ||
|
||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/wandb_table.jpg" alt="wandb_table" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these links will be embedded correctly as images (they are references to the github tree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog?
I have uploaded all the images here but unable to find a way to let github markdown render the image
lmms_eval.md
Outdated
|
||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/org_dataset.png" alt="dataset on organization"/> | ||
|
||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/viewer.png" alt="viewer" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about the image link.
thanks a lot for the blog post! I'll give this a spin 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly nits 😊
lmms_eval.md
Outdated
- user: liuziwei7 | ||
guest: true | ||
--- | ||
# Unified multimodal large model evaluation, accelerating multimodal intelligence emergence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make it uppercase for h1 IMO
lmms_eval.md
Outdated
|
||
Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing. | ||
|
||
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for multimodal large models. Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to directly give a link to lmms-eval instead of putting it in code formatting
lmms_eval.md
Outdated
|
||
<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/teaser.png" alt="Pipeline"/> | ||
|
||
## Overview of the main features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again maybe uppercase main and features
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Merve Noyan <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Hi @pcuenca @merveenoyan , thank you for your kind feedback. I have tried to fix most of the issue in the comments and the image source issue. May I kindly ask for a review for this version and I will try to update the date in |
Thanks for iterating @kcz358 ! Would you mind resolving the merge conflicts and then we should be pretty good to go! |
Hi @lewtun , I have merged the main branch and added the Chinese version of the blog. I have also updated the date in |
Hi @lewtun , sorry for pinning you again. Do you think we are able to merge for current version? |
_blog.yml
Outdated
- local: sc2-instruct | ||
title: "StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation" | ||
thumbnail: /blog/assets/sc2-instruct/sc2-instruct-banner.png | ||
author: yuxiang630 | ||
guest: true | ||
date: Apr 29, 2024 | ||
tags: | ||
- nlp | ||
- community | ||
- research | ||
- LLM | ||
|
||
- local: evaluation-structured-outputs | ||
title: "Improving Prompt Consistency with Structured Generations" | ||
author: willkurt | ||
guest: true | ||
thumbnail: /blog/assets/evaluating-mmlu-leaderboard/thumbnail.png | ||
date: Apr 30, 2024 | ||
tags: | ||
- evaluation | ||
- collaboration | ||
- research | ||
- leaderboard | ||
|
||
- local: asr-diarization | ||
title: "Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints" | ||
author: sergeipetrov | ||
thumbnail: /blog/assets/asr-diarization/thumbnail.png | ||
date: May 1, 2024 | ||
tags: | ||
- audio | ||
- asr | ||
- inference | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm these entries shouldn't be here. Can you try to merge main
again and ensure there are no duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for spotting out the issue! I have merge the main again and delete the duplicates.
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
@pcuenca I resolved the merge conflicts - ok if we merge this? (Feel free to do so if you agree) |
Hi @lewtun , this is our blog for the lmms-eval. Could you help us check the article and see whether there are something that can be added for example user experience or how to add a new model? Also, you might also want to add your names in the author list.
Thank you!
This blog introduces a new evaluation pipeline for large vision language model. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry.