Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ViViT variant with factorized self-attention #327

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

roydenwa
Copy link
Contributor

Hi @lucidrains,

I implemented the ViViT variant with factorized self-attention (Model 3 in the paper), which learns spatio-temporal features per patch/ tube instead of the global/ frame-wise features learned in the "factorized encoder" variant. Could be useful for downstream tasks that require patch-wise features like video segmentation.

@lucidrains
Copy link
Owner

@roydenwa this looks great! thank you so much!

@lucidrains lucidrains merged commit 9d43e4d into lucidrains:main Aug 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants