Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLMO + RL #424

Draft
wants to merge 62 commits into
base: olmo_again
Choose a base branch
from
Draft

OLMO + RL #424

wants to merge 62 commits into from

Conversation

vwxyzjn
Copy link
Collaborator

@vwxyzjn vwxyzjn commented Nov 8, 2024

I put the code here. To reproduce my work, pip install ai2_olmo and run

for beta in 0.05
do
for lr in 3e-7
do
python mason.py \
    --cluster ai2/augusta-google-1 --image nathanl/open_instruct_auto --pure_docker_mode \
    --workspace ai2/tulu-3-dev \
    --priority high \
    --preemptible \
    --num_nodes 1 \
    --image costah/open_instruct_ppo_ray_olmo \
    --budget ai2/allennlp \
    --gpus 8 --  pip install --upgrade transformers \&\& python open_instruct/ppo_vllm_thread_ray_gtrl_olmo.py \
    --exp_name "ppo_olmo_rm_init_one_epoch_beta_${beta}_lr_${lr}" \
    --beta $beta \
    --learning_rate $lr \
    --dataset_mixer "{\"ai2-adapt-dev/gsm8k_ground_truth\": 1.0}" \
    --dataset_train_splits train \
    --dataset_eval_mixer "{\"ai2-adapt-dev/gsm8k_math_ground_truth\": 1.0}" \
    --dataset_eval_splits test \
    --max_token_length 2048 \
    --max_prompt_token_length 2048 \
    --response_length 1024 \
    --model_name_or_path allenai/open_instruct_dev \
    --model_revision olmo_7b_soup_anneal_v3.9_4_DPO___model__42__1730863426 \
    --reward_model_path allenai/open_instruct_dev \
    --reward_model_revision reward_modeling__1__1730930663 \
    --non_stop_penalty \
    --stop_token eos \
    --temperature 1.0 \
    --ground_truths_key ground_truth \
    --chat_template tulu \
    --sft_messages_key messages \
    --total_episodes 200000 \
    --penalty_reward_value -10.0 \
    --deepspeed_stage 3 \
    --per_device_train_batch_size 4 \
    --local_rollout_forward_batch_size 8 \
    --local_mini_batch_size 32 \
    --local_rollout_batch_size 32 \
    --actor_num_gpus_per_node 7 \
    --vllm_tensor_parallel_size 1 \
    --num_epochs 1 \
    --apply_verifiable_reward true \
    --output_dir /output \
    --seed 3 \
    --num_evals 3 \
    --reward_model_multiplier 0.0 \
    --no_try_launch_beaker_eval_jobs \
    --gradient_checkpointing \
    --with_tracking
done
done

hamishivi and others added 30 commits October 1, 2024 09:31
* Prototype ppo + ray

* reduce gradient

* push

* push changes

* quick push

* cache changes; this actually works with 6 nodes

* push changes

* psuh changes

* push the latest change

* push changes

* Fix uploading

* Make style

* style and quality

* update docs

* update mason.py

* log wandb tables

* update docs

* make style quality

* make sure to save the right thing

* push changes

* push

* push

* push changes

* push

* push

* fix

* remove preemption code

* fix

* push changes

* push

* quick fix

* quick push
* add ability to use alternate image

* rever
natolambert and others added 29 commits November 10, 2024 20:42
* update data dist plots

* nit

* smooth operation

* updates

* nits

* clean git

* cleaning for final SFT version
* oe eval priority

* up

---------

Co-authored-by: Nathan Lambert <[email protected]>
* fix and add script

* update

* fix copilot typo
* Reorganize the data preparation scripts for tulu v1 and v2.

* Minor improvement

* Remove open_platypus_commercial subset from Daring-Anteater

* Use hard-coded examples repo.

* Fix some bugs.

* Add OpenMathInstruct.

* Add a few more v3.5.x SFT mix ablations for the cleaner datasets.

* More experiments on mixes.

* help merge

* prep for merge

* reapply changes

* fix naming

---------

Co-authored-by: Nathan Lambert <[email protected]>
* Use vllm for all evaluations

* Do not use VLLM only for MMLU and TruthfulQA
* Quick change

* weight converter
* Support weka eval

* quick fix
* Quick change

* weight converter

* Add olmo1124 converter
* add all the config

* quick change
* first pass

* fix spelling, ground truth stuff

* fix misspelling

* count verifieds and intermediate saving

* save intermediate steps

* small fix to logging

* fix bug for forward rollout batching

* support gsm8k and math, more flexibility in future

* add costas plo thing

* add numina math

* remove plo, add value model rand init, first stab at rephrase model loading

* math strict verify

* ifeval code

* ifeval debug

* incorporate val fixes

* data fixed, remove skips

* Prototype ppo + ray (#390)

* Prototype ppo + ray

* reduce gradient

* push

* push changes

* quick push

* cache changes; this actually works with 6 nodes

* push changes

* psuh changes

* push the latest change

* push changes

* Fix uploading

* Make style

* style and quality

* update docs

* update mason.py

* log wandb tables

* update docs

* make style quality

* make sure to save the right thing

* push changes

* push

* push

* push changes

* push

* push

* fix

* remove preemption code

* fix

* push changes

* push

* quick fix

* quick push

* add weka save override

* add multinode ray file

* lint and fix

* first stab at flan

* eval on intermediate checkpoints (#414)

* quick change (#418)

* works with the weka eval and nfs eval

* fixes to ground truth

* push ppo ray

* add warm up

* revert changes

* merge

* quick change

---------

Co-authored-by: Hamish Ivison <[email protected]>
Co-authored-by: Hamish Ivison <[email protected]>
Co-authored-by: Costa Huang <[email protected]>
* Add .vscode to .gitignore

* Add dependencies for synth pref pipeline

* Initialize directory

* Port majority of the scripts to open-instruct

* Add documentation

* Add public birr as a submodule

* Minor fixes and update README

* Add instructions for creating annotation mix

* Run isort on source

* Update README

* Remove birr submodule

* Change birr from submodule to git-checkout

* Add examples

* Update README with better examples
* Push preview

* prepare for release

* quick push

* refactor

* fix dataset

* update 70Bconfig

* push
* fix oe-eval gpu count

* no dry run
* Allow eval to different repo

* remove this due to beaker job limit

* fix upload
* Add Acknowledgements

* further edit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants