Skip to content

Clamping actions between a range #2630

Answered by vmoens
TRPrasanna asked this question in Q&A
Discussion options

You must be logged in to vote

Yes safe is the way to go. That will clamp your actions.
I know it's something people do but I would advise against clamping actions, especially in policy optimization setting: the assumption is that when you do your importance weight, the two log-probabilities are the log-prob given the new distribution minus the log-prob given the original. But since your distribution is effectively truncated, the real log-prob isn't the one you're computing (integrating the prob over the space of actions that can be samples will be < 1 when it should be = 1).

In practice I usually advise people to use TanhNormal or a TruncatedNormal distribution (you'll find both in torchrl.modules).
https://pytorch.or…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@TRPrasanna
Comment options

Answer selected by TRPrasanna
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants