Retro-fitting a pretrained model #26

dean-sh · 2022-05-22T15:16:38Z

Hey,

Thank you for your implementation!
Is it possible to use your library to "retro-fit" a pretrained model?

I guess it would mean freezing the model during training, only fine-tuning the retrieval and cross-attention?
How would you recommend doing that?

Thanks!

bjascob · 2022-06-02T01:42:25Z

I'm interested in this as well but I haven't had time to work on it. The original paper "retrofitted" T5 by adding additional cross-attentions between the pretrained model and the KB retrieval/chunk system. They claimed it only took a small number of training steps to teach the revised model to utilize the new cross-attentions. I'm assuming this involved training all model weights on a masking task the same way that was done in the original pretraining.

It shouldn't be too difficult to hack up the Huffingface model code to add the cross attentions and then use the information retrieval components from here. I'll probably try this sometime in the next few months. I'm more interested in the Bart model so I was planning to work on that, not T5. Let me know if you or someone else get to it first.

bling0830 · 2022-06-30T02:29:05Z

Thank you for your implementation!

I'm interested in how would you add CCA to Bart, in encoder or decoder? If in encoder, CCA is causal, How would you recommend solving this. If in decoder, retrieval needs 64 token at least. If generated text less than 64 token, retrieval would not be used.

Thanks!

saisurbehera · 2022-11-18T20:44:13Z

Has anyone of you worked on the retrofitting part yet?

bjascob · 2022-11-18T21:41:57Z

I haven't had the time and although I'm still somewhat insterested, realistically I probably won't get to this.

It might be worth emailing the authors of the original paper to see if they'd be willing to post that code or provide additional information on the retrofitting process. As I recall, there was only a paragraph or so on it. Seems like there's a number of details it would be helpful if they could provide.

saisurbehera · 2022-11-21T17:49:06Z

Yup, let me email them and hopefully they respond.

saisurbehera · 2022-11-22T00:11:33Z

I just got a no response from them

ilyalasy · 2023-11-23T14:09:56Z

Hey there, anyone had any time to work on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retro-fitting a pretrained model #26

Retro-fitting a pretrained model #26

dean-sh commented May 22, 2022

bjascob commented Jun 2, 2022

bling0830 commented Jun 30, 2022

saisurbehera commented Nov 18, 2022

bjascob commented Nov 18, 2022

saisurbehera commented Nov 21, 2022

saisurbehera commented Nov 22, 2022

ilyalasy commented Nov 23, 2023

Retro-fitting a pretrained model #26

Retro-fitting a pretrained model #26

Comments

dean-sh commented May 22, 2022

bjascob commented Jun 2, 2022

bling0830 commented Jun 30, 2022

saisurbehera commented Nov 18, 2022

bjascob commented Nov 18, 2022

saisurbehera commented Nov 21, 2022

saisurbehera commented Nov 22, 2022

ilyalasy commented Nov 23, 2023