You should prepare GazeFollow and VideoAttentionTarget for training.
- Get GazeFollow.
- If train with auxiliary regression, use
scripts\gen_gazefollow_head_masks.py
to generate head masks. - Get VideoAttentionTarget.
Check ViTGaze/configs/common/dataloader
to modify DATA_ROOT.
-
Get DINOv2 pretrained ViT-S.
-
Or you could download and preprocess pretrained weights by
cd ViTGaze mkdir pretrained && cd pretrained wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth
-
Preprocess the model weights with
scripts\convert_pth.py
to fit Detectron2 format.
You can modify configs in configs/gazefollow.py
, configs/gazefollow_518.py
and configs/videoattentiontarget.py
.
Run:
bash train.sh
to train ViTGaze on the two datasets.
Training output will be saved in ViTGaze/output/
.