TSTPlus init adds extraneous layers with MVP Training? #120

xanderdunn · 2021-05-20T20:09:17Z

xanderdunn
May 20, 2021

The TSTPlus init creates its head like this:

    __init__:
        if custom_head: head = custom_head(self.head_nf, c_out, self.seq_len) # custom head passed as a partial func with all its kwargs
        else: head = self.create_head(self.head_nf, c_out, self.seq_len, flatten=flatten, concat_pool=concat_pool,                                           fc_dropout=fc_dropout, bn=bn, y_range=y_range)

    def create_head(self, nf, c_out, seq_len, flatten=True, concat_pool=False, fc_dropout=0., bn=False, y_range=None):
        if flatten:
            nf *= seq_len
            layers = [Flatten()]
        else:
            if concat_pool: nf *= 2
            layers = [GACP1d(1) if concat_pool else GAP1d(1)]
        layers += [LinBnDrop(nf, c_out, bn=bn, p=fc_dropout)]
        if y_range: layers += [SigmoidRange(*y_range)]
        return nn.Sequential(*layers)

However, when using it for MVP training, the MVP before_fit replaces the model's .head:

def before_fit():
# change head with conv layer (equivalent to linear layer applied to dim=1)
        assert hasattr(self.learn.model, "head"), "model must have a head attribute to be trained with MVP"
        self.learn.model.head = nn.Sequential(nn.Dropout(self.dropout),
                                              nn.Conv1d(self.learn.model.head_nf, self.learn.dls.vars, 1)
                                             ).to(self.learn.dls.device)

With the default code, my TSTPlus model:

Model has 436,161 total parameters and 436,161 trainable parameters.

When I remove the TSTPlus code that creates a head so that only MVP is creating the head, I see:

Model has 420,824 total parameters and 420,824 trainable parameters

Where the number of parameters in the graph is calculated with this:

def num_model_parameters(model) -> Tuple[int, int]:
    """
    Returns (total parameters, trainable parameters)
    """
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return (total_params, trainable_params)

(total, trainable) = num_model_parameters(learn.model)

It appears even though MVP replaced the TSTPlus head with its own Sequential(Dropout, Conv1d) head, the original TSTPlus head is still lingering in the torch graph. This is undesirable because it's still using VRAM and it may be computed on every iteration even though it shouldn't be affecting the outcome.

Am I misunderstanding how the TSTPlus __init__ head creation and the MVP head creation interact? Thanks!

oguiza · 2021-05-29T16:50:48Z

oguiza
May 29, 2021
Maintainer

Hi @xanderdunn,

Thanks for keeping a close eye on MVP :)
I've reviewed the code and didn't find any issues with it.
Here's the test I did. I believe it's working well:

dsid = 'MoteStrain'
X, y, splits = get_UCR_data(dsid, split_data=False)
tfms  = [None, [Categorize()]]
batch_tfms = [TSStandardize(by_var=True)]
unlabeled_dls = get_ts_dls(X, splits=splits, tfms=tfms, batch_tfms=batch_tfms)
learn = ts_learner(unlabeled_dls, TSTPlus, cbs=[MVP(fname=f'{dsid}')])
print(f'original model: {count_parameters(learn.model)}')
print(f'original backbone: {count_parameters(learn.model.backbone)}') # Original TSTPlus model without head
print(f'original head: {count_parameters(learn.model.head)}')  # Original TSTPlus head
print(learn.model.head)
learn('before_fit')  # this updates the model's head
print(f'modified model: {count_parameters(learn.model)}')
print(f'modified backbone: {count_parameters(learn.model.backbone)}')  # Modified TSTPlus model without head
print(f'modified head: {count_parameters(learn.model.head)}') # Modified TSTPlus head
print(learn.model.head)

# # output
# original model: 406640
# original backbone: 406384
# original head: 256
# Sequential(
#   (0): GAP1d(
#     (gap): AdaptiveAvgPool1d(output_size=1)
#     (flatten): Flatten(full=False)
#   )
#   (1): LinBnDrop(
#     (0): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (1): Linear(in_features=128, out_features=0, bias=False)
#   )
# )
# modified model: 406513
# modified backbone: 406384
# modified head: 129
# Sequential(
#   (0): Dropout(p=0.1, inplace=False)
#   (1): Conv1d(128, 1, kernel_size=(1,), stride=(1,))
# )

I'm not sure why there's such a big change in your numbers. It's logical that there's some difference, but not that big. Are you using any custom_head in TST that could explain this large drop?
Thanks!

PS: I've continued to use MVP with both InceptionTimePlus and TSTPlus and have had very good results with a good speed. Did you raise this issue for any performance issue?

1 reply

xanderdunn Jun 19, 2021
Author

Thanks @oguiza I will try to take a look at the differences between your parameter counts and mine soon.

No, I'm not having performance or accuracy issues. I re-implemented MVP TSTPlus in C++ using libtorch. The performance I see on benchmark datasets such as appliances energy is identical across the Python and C++ implementations, but I couldn't figure out why they had different parameter counts. I discovered the difference was the head initialization I mentioned above. I will revisit this to see if I can figure out why your numbers are different from mine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSTPlus init adds extraneous layers with MVP Training? #120

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

TSTPlus __init__ adds extraneous layers with MVP Training? #120

xanderdunn May 20, 2021

Replies: 1 comment · 1 reply

oguiza May 29, 2021 Maintainer

xanderdunn Jun 19, 2021 Author

TSTPlus init adds extraneous layers with MVP Training? #120

xanderdunn
May 20, 2021

Replies: 1 comment 1 reply

oguiza
May 29, 2021
Maintainer

xanderdunn Jun 19, 2021
Author