Clamping actions between a range #2630
-
Hi, apologies if this has been addressed before. I am setting up a custom environment and have used the following
in the init method of my custom environment. However, the actor (in my case, for PPO) picks values beyond the low and high values that I have set above. What would be the correct way to clamp the actions between the same range as set in the action key? Let me know if additional context is required. Edit: I passed safe = True when creating the ProbabilisticActor object and it seems to work, but I am not sure if it is the intended way.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes In practice I usually advise people to use That being said I also acknowledge that a lot of what is done in RL (and ML in general) is done "because it works" rather than because it's motivated theorically :) |
Beta Was this translation helpful? Give feedback.
Yes
safe
is the way to go. That will clamp your actions.I know it's something people do but I would advise against clamping actions, especially in policy optimization setting: the assumption is that when you do your importance weight, the two log-probabilities are the log-prob given the new distribution minus the log-prob given the original. But since your distribution is effectively truncated, the real log-prob isn't the one you're computing (integrating the prob over the space of actions that can be samples will be < 1 when it should be = 1).
In practice I usually advise people to use
TanhNormal
or aTruncatedNormal
distribution (you'll find both intorchrl.modules
).https://pytorch.or…