Replies: 1 comment 3 replies
-
just for my own interest: Did you achieve significant decrease in loss after halving the learning rate ? From your graphics I can't help but think the halving helps to not increase loss, but no more decrease whatsoever. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
After my last training it became pretty obvious a dynamic learning rate can be implemented.
Here's the evidence:
In this graph you can see loss rates during training as well as learning rates with a few rollbacks.
The learning rate scheduler would do as follows. Whenever it detects an increase of X in the mean loss rate, it interrupts training, rolls back to the Y previously saved model and starts from that point with LR multiplied by Z.
In my example:
Sensitivity X of 0.05
Roll back Y of 2, I was saving every 200 steps
LR multiplayer Z of 0.5, so it halves the learning rate every time this happens.
Mind you this training was done on Swish with dropout on. It shouldn't technically go bad as Swish uses sigmoid and that is normalized by, well, being a sigmoid, but I assume it is only being applied in the hidden layers, so the external layers could still break by being linear.
Beta Was this translation helpful? Give feedback.
All reactions