You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors,
Thanks for your great work. The maximum text context length for the CLIP text encoder is 77. However, the token length of several captions in quilt-1m is larger than 77. How can we utilize the CLIP text encoder to extract the caption features?
The text was updated successfully, but these errors were encountered:
For your needs you can try the PMB version of QuiltNet here: https://huggingface.co/wisdomik/QuiltNet-B-16-PMB which refers to PubmedBert, a BERT model of 256 context length pre-trained on PMC-15M and fine-tuned alongside the image tower on Quilt-1M.
Thank you very much for your quick reply. Regarding the ViT-B-32|GPT-77 version of QuiltNet, how do you handle captions that exceed 77 in length? Did you implement a truncation operation?
Dear authors,
Thanks for your great work. The maximum text context length for the CLIP text encoder is 77. However, the token length of several captions in quilt-1m is larger than 77. How can we utilize the CLIP text encoder to extract the caption features?
The text was updated successfully, but these errors were encountered: