Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987

Open
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

kcz358
Copy link

@kcz358 kcz358 commented Apr 15, 2024

Hi @lewtun , this is our blog for the lmms-eval. Could you help us check the article and see whether there are something that can be added for example user experience or how to add a new model? Also, you might also want to add your names in the author list.

Thank you!


This blog introduces a new evaluation pipeline for large vision language model. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this blog post! I left a few minor suggestions and a pointer to include the details in _blog.yml

lmms_eval.md Outdated
@@ -0,0 +1,85 @@
---
title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
thumbnail: https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/lmms-eval-header.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should live in the blog repo directly to render on hf.co/blog. See here for an example: https://github.com/huggingface/blog/pull/2021/files#diff-a332b83464cf2b650715bacb6e3f07b994af0790acc88a4ea353883ba2ae751eR3853

Note you also need to add the blog details to _blog.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I have also noticed that in the _blog.yml, we can only have one author on the list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's just for the thumbnail, but the blog post itself will show all authors:

Screenshot 2024-04-24 at 16 27 21

lmms_eval.md Outdated Show resolved Hide resolved
lmms_eval.md Show resolved Hide resolved
lmms_eval.md Outdated
**One-click evaluation**: lmms-eval allows users to easily evaluate their model performance on multiple datasets with a single command, without the need for manual dataset preparation. With just one line of code, users can obtain comprehensive evaluation results within minutes, including detailed logs and sample analysis covering model parameters, inputs and outputs, correct answers, etc. This is suitable for scenarios where advanced models like GPT4 are needed for scoring.

```
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs
# pip install git+https://github.com/huggingface/lmms-eval.git
accelerate launch --multi_gpu --num_processes=8 -m lmms_eval \
--model llava \
--model_args pretrained="liuhaotian/llava-v1.5-7b" \
--tasks mme,mmbench_en \
--batch_size 1 \
--log_samples \
--log_samples_suffix llava_v1.5_mme_mmbenchen \
--output_path ./logs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will change the link to our current repo since hf forked repo is kind of behind and I will also add pip install git+https://github.com/haotian-liu/LLaVA.git

lmms_eval.md Outdated Show resolved Hide resolved
lmms_eval.md Outdated Show resolved Hide resolved
lmms_eval.md Outdated

Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

lmms_eval.md Outdated Show resolved Hide resolved
@kcz358
Copy link
Author

kcz358 commented Apr 20, 2024

Hi @lewtun , thank you for your feedback.

I have uploaded the thumbnail picture and fixed several problems in the blog. Could you help us check if there are any more problems to fix in this article?

When we finalize the English version of the article, we will also help to translate everything into Chinese and put it into /blog/zh

Thank you!

@kcz358 kcz358 requested a review from lewtun April 24, 2024 04:19
Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @kcz358 ! This all looks good to me and gently pinging @pcuenca for final approval

Context: this is a blog post about an open source lib for evaluating multimodal models that the TRL team contributed to and it what we recommend in the TRL examples.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting!

also cc @merveenoyan for info.

_blog.yml Outdated
title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
author: kcz358
thumbnail: /blog/assets/lmms_eval/thumbnail.png
date: April 20, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to update date before release :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also I'd move the entry to the end of the file, just in case)

lmms_eval.md Show resolved Hide resolved
lmms_eval.md Show resolved Hide resolved
lmms_eval.md Show resolved Hide resolved
lmms_eval.md Show resolved Hide resolved
lmms_eval.md Show resolved Hide resolved
lmms_eval.md Show resolved Hide resolved
lmms_eval.md Outdated

**Synchronized Online Logging**: We provide detailed logging tools to help you understand the evaluation process and results. Logs include model parameters, generation parameters, input questions, model responses, and ground truth answers. You can record every detail and visualize it in Weights & Biases runs. Users can access results in real-time from anywhere, making it convenient and efficient.

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/wandb_table.jpg" alt="wandb_table" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these links will be embedded correctly as images (they are references to the github tree)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog?

I have uploaded all the images here but unable to find a way to let github markdown render the image

lmms_eval.md Outdated

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/org_dataset.png" alt="dataset on organization"/>

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/viewer.png" alt="viewer" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the image link.

lmms_eval.md Show resolved Hide resolved
@merveenoyan
Copy link
Contributor

thanks a lot for the blog post! I'll give this a spin 😊

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly nits 😊

lmms_eval.md Outdated
- user: liuziwei7
guest: true
---
# Unified multimodal large model evaluation, accelerating multimodal intelligence emergence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make it uppercase for h1 IMO

lmms_eval.md Outdated

Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for multimodal large models. Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to directly give a link to lmms-eval instead of putting it in code formatting

lmms_eval.md Outdated

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/teaser.png" alt="Pipeline"/>

## Overview of the main features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again maybe uppercase main and features

lmms_eval.md Show resolved Hide resolved
lmms_eval.md Show resolved Hide resolved
kcz358 and others added 3 commits April 25, 2024 10:23
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Merve Noyan <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
@kcz358
Copy link
Author

kcz358 commented Apr 25, 2024

Hi @pcuenca @merveenoyan , thank you for your kind feedback.

I have tried to fix most of the issue in the comments and the image source issue. May I kindly ask for a review for this version and I will try to update the date in _blog.yml before release.

@kcz358 kcz358 requested review from pcuenca and merveenoyan April 29, 2024 04:32
@lewtun
Copy link
Member

lewtun commented May 1, 2024

Thanks for iterating @kcz358 ! Would you mind resolving the merge conflicts and then we should be pretty good to go!

@kcz358
Copy link
Author

kcz358 commented May 2, 2024

Hi @lewtun , I have merged the main branch and added the Chinese version of the blog. I have also updated the date in _blog.yml

@kcz358
Copy link
Author

kcz358 commented May 16, 2024

Hi @lewtun , sorry for pinning you again. Do you think we are able to merge for current version?

_blog.yml Outdated
Comment on lines 3915 to 3948
- local: sc2-instruct
title: "StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation"
thumbnail: /blog/assets/sc2-instruct/sc2-instruct-banner.png
author: yuxiang630
guest: true
date: Apr 29, 2024
tags:
- nlp
- community
- research
- LLM

- local: evaluation-structured-outputs
title: "Improving Prompt Consistency with Structured Generations"
author: willkurt
guest: true
thumbnail: /blog/assets/evaluating-mmlu-leaderboard/thumbnail.png
date: Apr 30, 2024
tags:
- evaluation
- collaboration
- research
- leaderboard

- local: asr-diarization
title: "Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints"
author: sergeipetrov
thumbnail: /blog/assets/asr-diarization/thumbnail.png
date: May 1, 2024
tags:
- audio
- asr
- inference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm these entries shouldn't be here. Can you try to merge main again and ensure there are no duplicates?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting out the issue! I have merge the main again and delete the duplicates.

@lewtun
Copy link
Member

lewtun commented May 30, 2024

@pcuenca I resolved the merge conflicts - ok if we merge this? (Feel free to do so if you agree)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants