VOLO: Vision Outlooker for Visual Recognition, arxiv

This is a PyTorch implementation of our paper. We present Vision Outlooker (VOLO). We show that our VOLO achieves SOTA performance on ImageNet and CityScapes. No extra training data is used in our work.

ImageNet top-1 accuracy comparison with the state-of-the-art (sota) CNN-based and Transformer-based models. All results are based on the best test resolutions. Our VOLO-D5 achieves SOTA performance on ImageNet without extra data in 2021/06.

(Updating... codes and models for downstream tasks like semantic segmentation are coming soon.)

Reference

@misc{yuan2021volo,
      title={VOLO: Vision Outlooker for Visual Recognition}, 
      author={Li Yuan and Qibin Hou and Zihang Jiang and Jiashi Feng and Shuicheng Yan},
      year={2021},
      eprint={2106.13112},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

1. Requirements

torch>=1.7.0; torchvision>=0.8.0; timm==0.4.5; tlt==0.1.0; pyyaml; apex-amp

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Directory structure in this repo:

│volo/
├──figures/
├──loss/
│  ├── __init__.py
│  ├── cross_entropy.py
├──models/
│  ├── __init__.py
│  ├── volo.py
├──utils/
│  ├── __init__.py
│  ├── utils.py
├──LICENSE
├──README.md
├──distributed_train.sh
├──main.py
├──validate.py

2. VOLO Models

Model #params Image resolution Top1 Acc Download volo_d1 27M 224 84.2 here volo_d1 ↑384 27M 384 85.2 here volo_d2 59M 224 85.2 here volo_d2 ↑384 59M 384 86.0 here volo_d3 86M 224 85.4 here volo_d3 ↑448 86M 448 86.3 here volo_d4 193M 224 85.7 here volo_d4 ↑448 193M 448 86.8 here volo_d5 296M 224 86.1 here volo_d5 ↑448 296M 448 87.0 here volo_d5 ↑512 296M 512 87.1 here

Usage

Instructions on how to use our pre-trained VOLO models:

from models.volo import *
from utils import load_pretrained_weights 

# create model
model = volo_d1()

# load the pretrained weights
# change num_classes based on dataset, can work for different image size 
# as we interpolate the position embeding for different image size.
load_pretrained_weights(model, "/path/to/pretrained/weights", use_ema=False, 
                        strict=False, num_classes=1000)

3. Validation

To evaluate our VOLO models, run:

python3 validate.py /path/to/imagenet  --model volo_d1 \
  --checkpoint /path/to/checkpoint --no-test-pool --apex-amp --img-size 224 -b 128

Change the --img-size from 224 to 384 or 448 for different image resolution, for example, to evaluate volo-d5 on 512 (87.1), run:

python3 validate.py /path/to/imagenet  --model volo_d5 \
  --checkpoint /path/to/volo_d5_512 --no-test-pool --apex-amp --img-size 512 -b 32

4. Train

Download token labeling data as we use token labeling, details about token labling are in here.

For each VOLO model, we first train it with image-size as 224 then finetune on image-size as 384 or 448/512:

train volo_d1 on 224 and finetune on 384

8 GPU, batch_size=1024, 19G GPU-memory in each GPU with apex-amp (mixed precision training)

train volo_d2 on 224 and finetune on 384

8 GPU, batch_size=1024, 27G GPU-memory in each GPU with apex-amp (mixed precision training)

5. Acknowledgement

We gratefully acknowledge the support of NVIDIA AI Tech Center (NVAITC) to this research project, especially the great helps in GPU technology supports from Terry Jianxiong Yin (NVAITC) and Qingyi Tao (NVAITC).

LICENSE

This repo is under the Apache-2.0 license. For commercial use, please contact with the authors.

GitHub - sail-sg/volo: VOLO: Vision Outlooker for Visual Recognition

VOLO: Vision Outlooker for Visual Recognition, arxiv

Reference

1. Requirements

2. VOLO Models

Usage

3. Validation

4. Train

5. Acknowledgement

LICENSE

Recommend

Mumble

Roblox on Linux | Roblox Wiki | Fandom

线下活动 | 赞意 × 青年志圈层沙龙全面启动，我们北京、上海见（内含报名方式）

[2106.12684] A Pure HTTP/3 Alternative to MQTT-over-QUIC in Resource-Constrained...

模拟地球？预测气候？地球居然可以被搬进实验室！

Sandia National Laboratories : Licensing/Technology Transfer The Sandia Coole...

“看不上传统车企人”？新造车牵出汽车业新“鄙视链”

How They Survived a Database Outage: 3 Companies Share Stories

GitHub - replit/kaboom: 💥 JavaScript game library

GitHub - tj/commander.js: node.js command-line interfaces made easy

About Joyk