Suggestion: Dynamic learning rate #3689

Heathen · 2022-10-25T22:40:36Z

Heathen
Oct 25, 2022

After my last training it became pretty obvious a dynamic learning rate can be implemented.

Here's the evidence:

In this graph you can see loss rates during training as well as learning rates with a few rollbacks.

The learning rate scheduler would do as follows. Whenever it detects an increase of X in the mean loss rate, it interrupts training, rolls back to the Y previously saved model and starts from that point with LR multiplied by Z.

In my example:
Sensitivity X of 0.05
Roll back Y of 2, I was saving every 200 steps
LR multiplayer Z of 0.5, so it halves the learning rate every time this happens.

Mind you this training was done on Swish with dropout on. It shouldn't technically go bad as Swish uses sigmoid and that is normalized by, well, being a sigmoid, but I assume it is only being applied in the hidden layers, so the external layers could still break by being linear.

Ehplodor · 2022-10-26T09:31:49Z

Ehplodor
Oct 26, 2022

just for my own interest: Did you achieve significant decrease in loss after halving the learning rate ? From your graphics I can't help but think the halving helps to not increase loss, but no more decrease whatsoever.

3 replies

Heathen Oct 26, 2022
Author

I have yet to notice any real decrease of loss during training. At most it was just 0.02. It feels more like the hypernetwork is doing the same thing, just slightly different in a way that isn't quantified by loss rates. When it goes haywire and starts spewing nonsense, that's easily too far from the normal, so the losses increase.

Also, with whatever they did in the last few updates, it seems most activation models are heading towards the same goal, instead of there being a complex geometry for them to navigate around. Every time I train on swish, it always goes towards the same result, no matter how many learning rate peaks I give it. And that result isn't always the desired one, sadly.

Ehplodor Oct 27, 2022

the hypernetwork is doing the same thing, just slightly different in a way that isn't quantified by loss rates. When it goes haywire and starts spewing nonsense, that's easily too far from the normal, so the losses increase.

That's interesting. Maybe there are other ways to compute the loss that would be more adequate, then ?

Ehplodor Oct 31, 2022

highly relevant : study of loss : #4043

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Dynamic learning rate #3689

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Suggestion: Dynamic learning rate #3689

Heathen Oct 25, 2022

Replies: 1 comment · 3 replies

Ehplodor Oct 26, 2022

Heathen Oct 26, 2022 Author

Ehplodor Oct 27, 2022

Ehplodor Oct 31, 2022

Heathen
Oct 25, 2022

Replies: 1 comment 3 replies

Ehplodor
Oct 26, 2022

Heathen Oct 26, 2022
Author