Request: Evaluation Code for SmolLM-1 #34

bg51717 · 2024-12-14T09:30:19Z

Hi,
I am very interested in reproducing the evaluation results for SmolLM-1 as mentioned in your blog. I noticed that while the model weights are available, I couldn't find the specific evaluation scripts and code used for benchmarking SmolLM-1.
Could you please share the evaluation scripts/code used for testing SmolLM-1.
Thank you for your time and consideration!

HallerPatrick · 2024-12-14T09:36:22Z

The evaluation is here: https://github.com/huggingface/smollm/blob/main/evaluation/README.md

I think it is still missing the mmlu-cloze task

bg51717 · 2024-12-14T09:41:47Z

The evaluation is here: https://github.com/huggingface/smollm/blob/main/evaluation/README.md

I think it is still missing the mmlu-cloze task

@HallerPatrick Really thank for your reply.
I want to evaluate the SmolLM-1. And I only find the scripts to evaluate the SmolLM-2.I try use the script to evaluate the SmolLM-1-135M,I get the different results from https://huggingface.co/blog/smollm#evaluation .
Do you have some idea about this? Looking forward to your reply.

HallerPatrick · 2024-12-14T09:54:52Z

@bg51717 Sorry, I missed that you looking for version 1.
In the blog post they reference following scripts: https://github.com/huggingface/cosmopedia/tree/main/evaluation. Did they also yield different results?

bg51717 · 2024-12-14T15:12:14Z

The evaluation is here: https://github.com/huggingface/smollm/blob/main/evaluation/README.md

I think it is still missing the mmlu-cloze task

I don't know which parts MMLU is composed of.

bg51717 · 2024-12-16T09:08:05Z

The evaluation script I used is :

lighteval accelerate \
    --model_args "pretrained=HuggingFaceTB/SmolLM-135M,revision=main,dtype=bfloat16,vllm,gpu_memory_utilisation=0.8,max_model_length=2048,data_parallel_size=4" \
    --custom_tasks "tasks.py" \
    --tasks "custom|mmlu_pro|0|1" \
    --output_dir "../eval_results"

The model is https://huggingface.co/HuggingFaceTB/SmolLM-135M and implementation commit is https://huggingface.co/HuggingFaceTB/SmolLM-135M/tree/1d461723eec654e65efdc40cf49301c89c0c92f4.

The tasks.py is from https://github.com/huggingface/smollm/blob/main/evaluation/tasks.py and its implementation commit is https://github.com/huggingface/smollm/tree/58c5cc63d2d993c441ae5e4ee5ed40cfc567332d.

The mmlu scores I got is 11.15 while the blog https://huggingface.co/blog/smollm recorded is 30.23.

loubnabnl · 2024-12-18T12:03:29Z

Hi, you're using MMLU Pro which is harder than MMLU, @anton-l let's also add MMLU/smollm1 evals to this repo?

You can reproduce the results in the blog post using this code https://github.com/huggingface/cosmopedia/tree/main/evaluation with

accelerate launch --num_processes=1 --main_process_port=29600 "lighteval/run_evals_accelerate.py" --model_args="pretrained=$MODEL" \
      --custom_tasks "lighteval_tasks.py" --output_dir $OUTPUT_DIR --override_batch_size 16 \
      --tasks "custom|mmlu_cloze:abstract_algebra|0|1,custom|mmlu_cloze:anatomy|0|1,custom|mmlu_cloze:astronomy|0|1,custom|mmlu_cloze:business_ethics|0|1,custom|mmlu_cloze:clinical_knowledge|0|1,custom|mmlu_cloze:college_biology|0|1,custom|mmlu_cloze:college_chemistry|0|1,custom|mmlu_cloze:college_computer_science|0|1,custom|mmlu_cloze:college_mathematics|0|1,custom|mmlu_cloze:college_medicine|0|1,custom|mmlu_cloze:college_physics|0|1,custom|mmlu_cloze:computer_security|0|1,custom|mmlu_cloze:conceptual_physics|0|1,custom|mmlu_cloze:econometrics|0|1,custom|mmlu_cloze:electrical_engineering|0|1,custom|mmlu_cloze:elementary_mathematics|0|1,custom|mmlu_cloze:formal_logic|0|1,custom|mmlu_cloze:global_facts|0|1,custom|mmlu_cloze:high_school_biology|0|1,custom|mmlu_cloze:high_school_chemistry|0|1,custom|mmlu_cloze:high_school_computer_science|0|1,custom|mmlu_cloze:high_school_european_history|0|1,custom|mmlu_cloze:high_school_geography|0|1,custom|mmlu_cloze:high_school_government_and_politics|0|1,custom|mmlu_cloze:high_school_macroeconomics|0|1,custom|mmlu_cloze:high_school_mathematics|0|1,custom|mmlu_cloze:high_school_microeconomics|0|1,custom|mmlu_cloze:high_school_physics|0|1,custom|mmlu_cloze:high_school_psychology|0|1,custom|mmlu_cloze:high_school_statistics|0|1,custom|mmlu_cloze:high_school_us_history|0|1,custom|mmlu_cloze:high_school_world_history|0|1,custom|mmlu_cloze:human_aging|0|1,custom|mmlu_cloze:human_sexuality|0|1,custom|mmlu_cloze:international_law|0|1,custom|mmlu_cloze:jurisprudence|0|1,custom|mmlu_cloze:logical_fallacies|0|1,custom|mmlu_cloze:machine_learning|0|1,custom|mmlu_cloze:management|0|1,custom|mmlu_cloze:marketing|0|1,custom|mmlu_cloze:medical_genetics|0|1,custom|mmlu_cloze:miscellaneous|0|1,custom|mmlu_cloze:moral_disputes|0|1,custom|mmlu_cloze:moral_scenarios|0|1,custom|mmlu_cloze:nutrition|0|1,custom|mmlu_cloze:philosophy|0|1,custom|mmlu_cloze:prehistory|0|1,custom|mmlu_cloze:professional_accounting|0|1,custom|mmlu_cloze:professional_law|0|1,custom|mmlu_cloze:professional_medicine|0|1,custom|mmlu_cloze:professional_psychology|0|1,custom|mmlu_cloze:public_relations|0|1,custom|mmlu_cloze:security_studies|0|1,custom|mmlu_cloze:sociology|0|1,custom|mmlu_cloze:us_foreign_policy|0|1,custom|mmlu_cloze:virology|0|1,custom|mmlu_cloze:world_religions|0|1"
      ```

anton-l · 2024-12-23T16:38:23Z

The MMLU task is now also available in the new suite with --tasks "custom|mmlu|0|1". Let me know if you have any issues!

bg51717 · 2024-12-24T09:11:20Z

The MMLU task is now also available in the new suite with --tasks "custom|mmlu|0|1". Let me know if you have any issues!

@anton-l @loubnabnl
Thanks for your reply.I could get the same the results.
But I still have one issues.
I find that the ckpt you released on https://huggingface.co/HuggingFaceTB/SmolLM-135M is not a nanotron ckpt.So I want to know how to convert a hf ckpt to a nanotron ckpt. I tried to use the https://github.com/huggingface/nanotron/blob/main/examples/llama/convert_hf_to_nanotron.py and did continue pretrain. I got a big loss with 7 as first. And after training, the loss would be 2.5. I think I didn't convert the checkpoint successfully.

Looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Evaluation Code for SmolLM-1 #34

Request: Evaluation Code for SmolLM-1 #34

bg51717 commented Dec 14, 2024

HallerPatrick commented Dec 14, 2024

bg51717 commented Dec 14, 2024 •

edited

Loading

HallerPatrick commented Dec 14, 2024

bg51717 commented Dec 14, 2024

bg51717 commented Dec 16, 2024 •

edited

Loading

loubnabnl commented Dec 18, 2024

anton-l commented Dec 23, 2024

bg51717 commented Dec 24, 2024 •

edited

Loading

Request: Evaluation Code for SmolLM-1 #34

Request: Evaluation Code for SmolLM-1 #34

Comments

bg51717 commented Dec 14, 2024

HallerPatrick commented Dec 14, 2024

bg51717 commented Dec 14, 2024 • edited Loading

HallerPatrick commented Dec 14, 2024

bg51717 commented Dec 14, 2024

bg51717 commented Dec 16, 2024 • edited Loading

loubnabnl commented Dec 18, 2024

anton-l commented Dec 23, 2024

bg51717 commented Dec 24, 2024 • edited Loading

bg51717 commented Dec 14, 2024 •

edited

Loading

bg51717 commented Dec 16, 2024 •

edited

Loading

bg51717 commented Dec 24, 2024 •

edited

Loading