-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packed Sequence Vision Transformer (aka NaViT) #1952
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
is this still in progress ? |
Hello @rwightman ! Any updates regarding this PR? Is there anything I can help with? |
@Adenialzz @b5y sorry for delay, been trying to get some other things out the door. So, where I got with this, I verified the modelling aspect works. The masking / handling of the packed patches seems fine. If you've looked I implemented a very rudimentary packing that is currently just injected in the forward of the model (it takes standard uniform batches of images, splits and then repacks). This is obviously not the point, but was a quick hack to allow me to test. For this to work efficiently the packing needs to be integreated into the datapipeline with extra buffering and a better thought out packing algorithm (essentially online bin packing). The data augmentations need to be tuned wrt to the dataset image size range such that you end up with a distribution of image sizes and patch lengths that's optimally packable. I hope to get back to this. The feature is definitely more data pipeline & packing working than modelling... Right now I'm working on a data loading library oriented towards large document (pdf) and image + text datasets and associated augmentations/preprocessing. I was thinking of moving the packing/pipeline code there once I get the initial version of that public & released... |
Hello @rwightman I was wondering if there are any updates for this PR? thank you |
A big WIP, pushing early to resolve masking stability issues with F.sdpa