Introduction

This repository is the official implementation of Contextual Transformer Networks for Visual Recognition.

CoT is a unified self-attention building block, and acts as an alternative to standard convolutions in ConvNet. As a result, it is feasible to replace convolutions with their CoT counterparts for strengthening vision backbones with contextualized self-attention.

2021/3/25-2021/6/5: CVPR 2021 Open World Image Classification Challenge

Rank 1 in Open World Image Classification Challenge @ CVPR 2021. (Team name: VARMS)

Usage

The code is mainly based on timm.

Requirement:

PyTorch 1.8.0+
Python3.7
CUDA 10.1+
CuPy.

Clone the repository:

git clone https://github.com/JDAI-CV/CoTNet.git

Train

First, download the ImageNet dataset. To train CoTNet-50 on ImageNet on a single node with 8 gpus for 350 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 train.py --folder ./experiments/cot_experiments/CoTNet-50-350epoch

The training scripts for CoTNet (e.g., CoTNet-50) can be found in the cot_experiments folder.

Inference Time vs. Accuracy

CoTNet models consistently obtain better top-1 accuracy with less inference time than other vision backbones across both default and advanced training setups. In a word, CoTNet models seek better inference time-accuracy trade-offs than existing vision backbones.

Results on ImageNet

name resolution #params FLOPs Top-1 Acc. Top-5 Acc. model CoTNet-50 224 22.2M 3.3 81.3 95.6 GoogleDrive / Baidu CoTNeXt-50 224 30.1M 4.3 82.1 95.9 GoogleDrive / Baidu SE-CoTNetD-50 224 23.1M 4.1 81.6 95.8 GoogleDrive / Baidu CoTNet-101 224 38.3M 6.1 82.8 96.2 GoogleDrive / Baidu CoTNeXt-101 224 53.4M 8.2 83.2 96.4 GoogleDrive / Baidu SE-CoTNetD-101 224 40.9M 8.5 83.2 96.5 GoogleDrive / Baidu SE-CoTNetD-152 224 55.8M 17.0 84.0 97.0 GoogleDrive / Baidu SE-CoTNetD-152 320 55.8M 26.5 84.6 97.1 GoogleDrive / Baidu

Access code for Baidu is cotn

Citing Contextual Transformer Networks

@article{cotnet,
  title={Contextual Transformer Networks for Visual Recognition},
  author={Li, Yehao and Yao, Ting and Pan, Yingwei and Mei, Tao},
  journal={arXiv preprint arXiv:2107.12292},
  year={2021}
}

Acknowledgements

Thanks the contribution of timm and awesome PyTorch team.

GitHub - JDAI-CV/CoTNet: This is an official implementation for "Contextual...

Introduction

2021/3/25-2021/6/5: CVPR 2021 Open World Image Classification Challenge

Usage

Requirement:

Clone the repository:

Train

Inference Time vs. Accuracy

Results on ImageNet

Citing Contextual Transformer Networks

Acknowledgements

Recommend

甘南州庆祝中国共产党成立100周年书画摄影作品展开展

Qwik with Misko Hevery on Web Rush #144

GitHub - Kazuhito00/Tokyo2020-Pictogram-using-MediaPipe: MediaPipeで姿勢推定を行...

How to Use Facebook's New Soundmojis on Messenger

千亿富豪、“蚝油大王”李文达去世，李锦记如何再创传奇

What Is a Vaccine Passport and How Does One Work?

贺子欣副局长带队赴玉田县、高新区开展安全生产督导检查

Pericles, Prince of Tyre — Shakespeare in Clark Park

What Is a Collect-a-Thon Video Game?

王团长区块链日记1361篇：比特币价格将达到70万美元

About Joyk