Add ViViT variant with factorized self-attention #327

roydenwa · 2024-08-21T20:50:28Z

I implemented the ViViT variant with factorized self-attention (Model 3 in the paper), which learns spatio-temporal features per patch/ tube instead of the global/ frame-wise features learned in the "factorized encoder" variant. Could be useful for downstream tasks that require patch-wise features like video segmentation.

lucidrains · 2024-08-22T02:23:31Z

@roydenwa this looks great! thank you so much!

roydenwa added 4 commits August 21, 2024 22:18

Add FactorizedTransformer

db12f36

Add variant param and check in fwd method

866f730

Check if variant is implemented

d24c1a4

Describe new ViViT variant

1217578

lucidrains merged commit 9d43e4d into lucidrains:main Aug 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ViViT variant with factorized self-attention #327

Add ViViT variant with factorized self-attention #327

roydenwa commented Aug 21, 2024

lucidrains commented Aug 22, 2024

Add ViViT variant with factorized self-attention #327

Add ViViT variant with factorized self-attention #327

Conversation

roydenwa commented Aug 21, 2024

lucidrains commented Aug 22, 2024