Skip to content

Latest commit

 

History

History

det

EVA-02: Object Detection & Instance Segmentation

We provide evaluation and training code on Object365, COCO and LVIS datasets. All model weights related to object detection and instance segmentation are available for the community.

Table of Contents

EVA-02 Model Card

EVA-02 uses ViTDet + Cascade Mask RCNN as the object detection and instance segmentation head. We mainly evaluate EVA-02 on COCO and LVIS val set.

To avoid data contamination, all LVIS models are initialized using IN-21K MIM pre-trained EVA-02. Refer to our paper for details.

Head-to-head Comparison

COCO

model name init. ckpt LSJ crop size batch size iter AP box AP mask config weight
eva02_B_coco_bsl eva02_B_pt_in21k_p14to16 1024x1024 128 60k 55.5 47.1 link 🤗 HF link
eva02_L_coco_bsl eva02_L_pt_m38m_p14to16 1024x1024 144 60k 59.2 50.8 link 🤗 HF link

LVIS

model name init. ckpt LSJ crop size batch size iter AP box AP mask config weight
eva02_B_lvis_bsl eva02_B_pt_in21k_p14to16 1024x1024 128 50k 47.1 41.4 link 🤗 HF link
eva02_L_lvis_bsl eva02_L_pt_in21k_p14to16 1024x1024 128 40k 55.3 48.6 link 🤗 HF link

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

model name init. ckpt LSJ crop size batch size iter AP box AP mask config weight
eva02_B_coco_sys eva02_B_pt_in21k_p14to16 1536x1536 128 60k 58.9 50.7 link 🤗 HF link
eva02_L_coco_sys eva02_L_pt_m38m_p14to16 1536x1536 128 60k 62.3 53.8 link 🤗 HF link

LVIS

model name init. ckpt LSJ crop size batch size iter AP box AP mask config weight
eva02_L_lvis_sys eva02_L_pt_in21k_p14to16 1536x1536 128 40k 60.1 53.5 link 🤗 HF link

Object365 Intermediate Fine-tuning

model name init. ckpt LSJ crop size batch size iter config weight
eva02_L_m38m_to_o365 eva02_L_pt_m38m_p14to16 1536x1536 160 400k link 🤗 HF link
eva02_L_in21k_to_o365 eva02_L_pt_in21k_p14to16 1536x1536 160 400k link 🤗 HF link

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

model name init. ckpt LSJ crop size batch size iter AP box AP mask config weight
eva02_L_coco_det_sys_o365 eva02_L_m38m_to_o365 1536x1536 64 40k 64.1 54.3 link 🤗 HF link
eva02_L_coco_seg_sys_o365 eva02_L_m38m_to_o365 1536x1536 64 40k 63.9 55.4 link 🤗 HF link
  • We use different checkpoints (from the same training job) for object detection and instance segmentation tasks, since the instance segmentation part is not pre-trained on O365 and converges slower on COCO.

LVIS

model name init. ckpt LSJ crop size batch size iter AP box AP mask config weight
eva02_L_lvis_sys_o365 eva02_L_in21k_to_o365 1536x1536 64 70k 65.2 57.3 link 🤗 HF link

Setup

Environment

First, setup EVA-02 pre-training & image classification environment, and install mmcv==1.7.1 for soft-nms.

Then, build EVA-02 det / Detectron2 from source:

cd /path/to/EVA-02/det
python -m pip install -e .

Data

For Object365 (O365) dataset, download it from here. The file structure of O365 should look like:

o365
├── annotations
│   ├── zhiyuan_objv2_train.json
│   └── zhiyuan_objv2_val.json
├── images
│   ├── patch0
│   ├── patch1
│       ...
│   └── patch50
└── ...

For COCO and LVIS datasets, please follow the official guidelines in Detectron2.

Overall, the structure of DETECTRON2_DATASETS should look like:

DETECTRON2_DATASETS
├── o365
├── coco
├── lvis
└── ...

EVA-02 pre-trained weight

MIM Pre-trained EVA-02

model name #params MIM pt dataset MIM pt epochs weight
eva02_B_pt_in21k_p14to16 86M IN-21K 150 🤗 HF link
eva02_L_pt_in21k_p14to16 304M IN-21K 150 🤗 HF link
eva02_L_pt_m38m_p14to16 304M Merged-38M 56 🤗 HF link
  • eva02_psz14to16 models interpolate the kernel size of patch_embed from 14x14 to 16x16, and interpolate the pos_embed from 16x16 to 14x14. This is useful for object detection, instance segmentation & semantic segmentation tasks.

O365 Intermediate Fine-tuned EVA-02

Please see here.

Evaluation

Head-to-head Comparison

COCO

Evaluate eva02_B_coco_bsl on COCO val using a single node with 4 gpus.
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_b_4attn_1024_lrd0p7.py \
 --eval-only \
 train.init_checkpoint=/path/to/eva02_B_coco_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
55.5017,74.8972,60.2801,36.3313,60.9265,72.7190
Task: segm
AP,AP50,AP75,APs,APm,APl
47.0794,71.8778,50.4881,25.9719,51.6530,67.5334
Evaluate eva02_L_coco_bsl on COCO val using a single node with 4 gpus.
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_l_4attn_1024_lrd0p8.py \
 --eval-only \
 train.init_checkpoint=/path/to/eva02_L_coco_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
59.1550,78.6420,64.0970,41.9209,64.4478,75.3628
Task: segm
AP,AP50,AP75,APs,APm,APl
50.7923,75.8581,55.2317,30.4149,55.2931,70.3713

LVIS

Evaluate eva02_B_lvis_bsl on LVIS val using a single node with 4 gpus.
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_lvis/eva2_lvis_cascade_mask_rcnn_vitdet_b_4attn_1024_lrd0p7.py \
 --eval-only \
 train.init_checkpoint=/share/project/yxf/open/eva02/det/eva02_B_lvis_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
47.1304,62.8418,49.3729,34.5095,58.6703,69.0790,36.3894,47.9805,50.9023
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
41.3575,60.4043,44.2597,27.0808,54.2237,66.4821,32.2175,42.5939,43.9947
Evaluate eva02_L_lvis_bsl on LVIS val using a single node with 4 gpus.
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_lvis/eva2_lvis_cascade_mask_rcnn_vitdet_l_4attn_1024_lrd0p8.py \
 --eval-only \
 train.init_checkpoint=/share/project/yxf/open/eva02/det/eva02_L_lvis_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
55.2866,71.9707,58.1774,41.9690,67.2664,74.6449,50.6344,56.7946,55.6481
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
48.5964,69.3752,52.5725,33.2063,61.7856,71.2531,45.5442,50.2401,48.1034

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

Evaluate eva02_B_coco_sys on COCO val using a single node with 4 gpus.
# evaluate object detection performance w/o maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_b_6attn_win32_1536_lrd0p7.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 train.init_checkpoint=/path/to/eva02_B_coco_sys.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
58.9331,77.8558,64.5855,42.1048,63.6015,74.3375
# evaluate instance segmentation performance w/ maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_b_6attn_win32_1536_lrd0p7.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_B_coco_sys.pth

Expected results:

Task: segm
AP,AP50,AP75,APs,APm,APl
50.6875,74.8027,55.4802,30.5407,54.2940,69.7923
Evaluate eva02_L_coco_sys on COCO val using a single node with 4 gpus.
# evaluate object detection performance w/o maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_l_8attn_win32_1536_lrd0p8.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 train.init_checkpoint=/path/to/eva02_L_coco_sys.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
62.2848,80.8003,68.0974,45.8547,66.7479,78.0017
# evaluate instance segmentation performance w/ maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_l_8attn_win32_1536_lrd0p8.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_L_coco_sys.pth

Expected results:

Task: segm
AP,AP50,AP75,APs,APm,APl
53.8006,78.2082,59.0474,34.2192,57.6212,72.6909

LVIS

Evaluate eva02_L_lvis_sys on LVIS val using a single node with 4 gpus.
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_lvis/eva2_lvis_cascade_mask_rcnn_vitdet_l_8attn_win32_1536_lrd0p8.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_L_lvis_sys.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
60.0506,74.5881,63.7702,48.6573,70.7069,77.6001,53.1295,61.7127,61.2374
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
53.5098,72.7459,57.9503,40.0757,65.5123,73.5845,47.5871,55.2739,54.1442

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

Evaluate eva02_L_coco_det_sys_o365 on COCO val using a single node with 4 gpus.
# evaluate object detection performance w/o maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva2_o365_to_coco_cascade_mask_rcnn_vitdet_l_8attn_1536_lrd0p8.py \
 --eval-only \
 train.model_ema.use_ema_weights_for_eval_only=True \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 train.init_checkpoint=/path/to/eva02_L_coco_det_sys_o365.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
64.1442,82.1375,70.2666,48.8727,68.5649,78.1375
Evaluate eva02_L_coco_seg_sys_o365 on COCO val using a single node with 4 gpus.
# evaluate instance segmentation performance w/ maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva2_o365_to_coco_cascade_mask_rcnn_vitdet_l_8attn_1536_lrd0p8.py \
 --eval-only \
 train.model_ema.use_ema_weights_for_eval_only=True \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_L_coco_seg_sys_o365.pth

Expected results:

Task: segm
AP,AP50,AP75,APs,APm,APl
55.3584,79.7107,61.4651,36.4061,59.0486,73.1774

LVIS

Evaluate eva02_L_lvis_sys_o365 on LVIS val using a single node with 4 gpus.
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_o365_to_lvis/eva2_o365_to_lvis_cascade_mask_rcnn_vitdet_l_8attn_1536_lrd0p8.py \
 --eval-only \
 train.model_ema.use_ema_weights_for_eval_only=True \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.maskness_thresh=0.5 \
 dataloader.evaluator.max_dets_per_image=1000 \
 train.init_checkpoint=/path/to/eva02_L_lvis_sys_o365.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
65.2244,78.9298,68.9737,55.3842,75.1310,79.8346,59.7070,66.8476,65.8378
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
57.3209,76.9294,62.1691,44.9472,68.7592,74.9193,52.4748,58.9491,57.6336

Training

Please select config.py and the corresponding init_checkpoint.pth based on Model Card.

All configs can be trained with:

python tools/lazyconfig_train_net.py \
    --num-gpus 8  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
    --config-file /path/to/config.py \
    train.init_checkpoint=/path/to/init_checkpoint.pth \
    train.output_dir=/path/to/output

Acknowledgment

EVA-02 object detection and instance segmentation are built upon Detectron2.