OLMO + RL #424

vwxyzjn · 2024-11-08T17:35:13Z

I put the code here. To reproduce my work, pip install ai2_olmo and run

for beta in 0.05
do
for lr in 3e-7
do
python mason.py \
    --cluster ai2/augusta-google-1 --image nathanl/open_instruct_auto --pure_docker_mode \
    --workspace ai2/tulu-3-dev \
    --priority high \
    --preemptible \
    --num_nodes 1 \
    --image costah/open_instruct_ppo_ray_olmo \
    --budget ai2/allennlp \
    --gpus 8 --  pip install --upgrade transformers \&\& python open_instruct/ppo_vllm_thread_ray_gtrl_olmo.py \
    --exp_name "ppo_olmo_rm_init_one_epoch_beta_${beta}_lr_${lr}" \
    --beta $beta \
    --learning_rate $lr \
    --dataset_mixer "{\"ai2-adapt-dev/gsm8k_ground_truth\": 1.0}" \
    --dataset_train_splits train \
    --dataset_eval_mixer "{\"ai2-adapt-dev/gsm8k_math_ground_truth\": 1.0}" \
    --dataset_eval_splits test \
    --max_token_length 2048 \
    --max_prompt_token_length 2048 \
    --response_length 1024 \
    --model_name_or_path allenai/open_instruct_dev \
    --model_revision olmo_7b_soup_anneal_v3.9_4_DPO___model__42__1730863426 \
    --reward_model_path allenai/open_instruct_dev \
    --reward_model_revision reward_modeling__1__1730930663 \
    --non_stop_penalty \
    --stop_token eos \
    --temperature 1.0 \
    --ground_truths_key ground_truth \
    --chat_template tulu \
    --sft_messages_key messages \
    --total_episodes 200000 \
    --penalty_reward_value -10.0 \
    --deepspeed_stage 3 \
    --per_device_train_batch_size 4 \
    --local_rollout_forward_batch_size 8 \
    --local_mini_batch_size 32 \
    --local_rollout_batch_size 32 \
    --actor_num_gpus_per_node 7 \
    --vllm_tensor_parallel_size 1 \
    --num_epochs 1 \
    --apply_verifiable_reward true \
    --output_dir /output \
    --seed 3 \
    --num_evals 3 \
    --reward_model_multiplier 0.0 \
    --no_try_launch_beaker_eval_jobs \
    --gradient_checkpointing \
    --with_tracking
done
done

…oading

* Prototype ppo + ray * reduce gradient * push * push changes * quick push * cache changes; this actually works with 6 nodes * push changes * psuh changes * push the latest change * push changes * Fix uploading * Make style * style and quality * update docs * update mason.py * log wandb tables * update docs * make style quality * make sure to save the right thing * push changes * push * push * push changes * push * push * fix * remove preemption code * fix * push changes * push * quick fix * quick push

* add ability to use alternate image * rever

* final configs * update

* update data dist plots * nit * smooth operation * updates * nits * clean git * cleaning for final SFT version

* oe eval priority * up --------- Co-authored-by: Nathan Lambert <[email protected]>

* fix and add script * update * fix copilot typo

* Reorganize the data preparation scripts for tulu v1 and v2. * Minor improvement * Remove open_platypus_commercial subset from Daring-Anteater * Use hard-coded examples repo. * Fix some bugs. * Add OpenMathInstruct. * Add a few more v3.5.x SFT mix ablations for the cleaner datasets. * More experiments on mixes. * help merge * prep for merge * reapply changes * fix naming --------- Co-authored-by: Nathan Lambert <[email protected]>

* Use vllm for all evaluations * Do not use VLLM only for MMLU and TruthfulQA

* Quick change * weight converter

* Support weka eval * quick fix

* Quick change * weight converter * Add olmo1124 converter

* add all the config * quick change

* first pass * fix spelling, ground truth stuff * fix misspelling * count verifieds and intermediate saving * save intermediate steps * small fix to logging * fix bug for forward rollout batching * support gsm8k and math, more flexibility in future * add costas plo thing * add numina math * remove plo, add value model rand init, first stab at rephrase model loading * math strict verify * ifeval code * ifeval debug * incorporate val fixes * data fixed, remove skips * Prototype ppo + ray (#390) * Prototype ppo + ray * reduce gradient * push * push changes * quick push * cache changes; this actually works with 6 nodes * push changes * psuh changes * push the latest change * push changes * Fix uploading * Make style * style and quality * update docs * update mason.py * log wandb tables * update docs * make style quality * make sure to save the right thing * push changes * push * push * push changes * push * push * fix * remove preemption code * fix * push changes * push * quick fix * quick push * add weka save override * add multinode ray file * lint and fix * first stab at flan * eval on intermediate checkpoints (#414) * quick change (#418) * works with the weka eval and nfs eval * fixes to ground truth * push ppo ray * add warm up * revert changes * merge * quick change --------- Co-authored-by: Hamish Ivison <[email protected]> Co-authored-by: Hamish Ivison <[email protected]> Co-authored-by: Costa Huang <[email protected]>

* Add .vscode to .gitignore * Add dependencies for synth pref pipeline * Initialize directory * Port majority of the scripts to open-instruct * Add documentation * Add public birr as a submodule * Minor fixes and update README * Add instructions for creating annotation mix * Run isort on source * Update README * Remove birr submodule * Change birr from submodule to git-checkout * Add examples * Update README with better examples

* Push preview * prepare for release * quick push * refactor * fix dataset * update 70Bconfig * push

* fix oe-eval gpu count * no dry run

* Allow eval to different repo * remove this due to beaker job limit * fix upload

* Add Acknowledgements * further edit

hamishivi and others added 30 commits October 1, 2024 09:31

first pass

b444e80

fix spelling, ground truth stuff

f0569a3

fix misspelling

8e0f517

count verifieds and intermediate saving

6eebf7f

save intermediate steps

f9a0b3c

small fix to logging

bad1933

fix bug for forward rollout batching

028315d

support gsm8k and math, more flexibility in future

faa7dc0

add costas plo thing

5970243

add numina math

f8fb8eb

remove plo, add value model rand init, first stab at rephrase model l…

eda4849

…oading

math strict verify

b709f37

Merge branch 'main' into verifiable-rewards

a328946

ifeval code

51f0b2a

ifeval debug

79ec960

incorporate val fixes

b1b47bf

data fixed, remove skips

d61038e

Merge branch 'main' into verifiable-rewards

b9de634

add weka save override

f6a2b75

add multinode ray file

b59659a

lint and fix

36a2ed4

first stab at flan

527c51f

eval on intermediate checkpoints (#414)

f037460

Merge branch 'main' into verifiable-rewards

ed18615

Fix dataset mixing logic (#415)

bdc3fa6

quick change

63a4449

Add ability to use alternate image for safety eval (#422)

2bc1772

* add ability to use alternate image * rever

Adding final nc configs for v3.9 (#416)

de33290

* final configs * update

Merge branch 'olmo_again' into rlolmo

3cfc9e2

natolambert and others added 29 commits November 10, 2024 20:42

update data dist plots (#410)

863b808

* update data dist plots * nit * smooth operation * updates * nits * clean git * cleaning for final SFT version

push changes

3422229

Ability to set oe-eval priority (#423)

1b44d61

* oe eval priority * up --------- Co-authored-by: Nathan Lambert <[email protected]>

Last fix for unseen evals. (#426)

8de53e6

* fix and add script * update * fix copilot typo

Use vllm for MMLU Pro (#428)

dd16008

* Use vllm for all evaluations * Do not use VLLM only for MMLU and TruthfulQA

Olmo1124ForCausalLM config. (#432)

b17443e

* Quick change * weight converter

Support weka evaluation oe eval (#435)

fe2817d

* Support weka eval * quick fix

Olmo1124converter (#434)

7fcbcfa

* Quick change * weight converter * Add olmo1124 converter

mmlu cot added (#429)

db4c0a1

Support for olmo1124 eval (#436)

27a9b9d

Mount oe-training weka bucket (#437)

d8bc8dc

Add files via upload (#438)

90b821c

push changes

7905e63

Merge branch 'main' into rlolmo

918b701

update mmlu and deepmind_math configs (#439)

c059659

Olmo1124 config (#440)

a9f964e

* add all the config * quick change

Add tulu3.md

30d11b4

remove change

4703ecc

Push preview (#443)

3dd7a07

* Push preview * prepare for release * quick push * refactor * fix dataset * update 70Bconfig * push

Fix broken image paths (#444)

2c60e7b

Fix oe-eval gpu count (#448)

bf57749

* fix oe-eval gpu count * no dry run

Allow eval to different repo (#447)

b089976

* Allow eval to different repo * remove this due to beaker job limit * fix upload

Add Acknowledgements (#451)

c80c404

* Add Acknowledgements * further edit

Merge branch 'main' into rlolmo

4d5c77a

quick change

e857e36

push

0bebd50

vwxyzjn mentioned this pull request Dec 1, 2024

Will you support fine-tuning from olmo2? #467

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OLMO + RL #424

OLMO + RL #424

vwxyzjn commented Nov 8, 2024 •

edited

Loading

OLMO + RL #424

Are you sure you want to change the base?

OLMO + RL #424

Conversation

vwxyzjn commented Nov 8, 2024 • edited Loading

vwxyzjn commented Nov 8, 2024 •

edited

Loading