-
First of all, wanted to thank the author (@oguiza) for this amazing library. It's very solid, easy to use, and incorporates tons of functionality. I haven't found any other library out there that manages transformers for time-series data in such an easy way. I also would like to ask some basic questions that have popped up while experimenting and trying the library out: -I'll start with an easy question: is it normal for the transformers to overfit drastically? I have found that the models I have trained with the library overfit very quickly even with low learning rates and dropout incorporated. When using LSTMs I rarely saw any overfitting, with the same dataset. Further on that: the learning curves look nothing like with regular NNs. (using SGD), where I usually get an exponential curve. It sort of looks crazy to me: The learning rate finder seems to work well, but probably a learning rate scheduler is very much needed to adjust for these swings.
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
I don't know if it's normal, but defintely I've seen it too in tasks where I had few sequences (~ 200). Making the model smaller in terms of the number of layers, heads and the
As long as your y has the correct shape, i.e., |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Hi @strakehyr, You can start with something like this: tfms = [None, TSRegression()]
batch_tfms = TSStandardize(by_var=True)
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms)
learn = ts_learner(dls, TSTPlus, metrics=[mae, mse], cbs=ShowGraph()) I don't know anything about the data, so it's a bit difficult to know where the nan values are coming from. |
Beta Was this translation helpful? Give feedback.
Hi @strakehyr,
I'd suggest that you start using the default settings. After some initial tests, you may then need to modify a few.
You can start with something like this:
I don't know anything about the data, so it's a bit difficult to know where the nan values are coming from.
Something I always do is to try to achieve the desired performance on the training set. This will help you determine the number of epochs required. It may be 5 or 500. You'll only know when you try to over…