You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, trl community. In the PPO, trl has the following code, which i think might be problematic, because the reward is added on one position after the last EOS token, not on the EOS token like it use to be. You can see the actual_end below doesn't seem to be right?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, trl community. In the PPO, trl has the following code, which i think might be problematic, because the reward is added on one position after the last EOS token, not on the EOS token like it use to be. You can see the
actual_end
below doesn't seem to be right?What does
padding_mask_p1
mean ? The link attached above does not seem to be a detailed explanation, with just one graph. Thank you for your help.Beta Was this translation helpful? Give feedback.
All reactions