

Github GitHub - facebookresearch/dino: PyTorch code for Vision Transformers trai...
source link: https://github.com/facebookresearch/dino
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Self-Supervised Vision Transformers with DINO
PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supervised Vision Transformers.
[blogpost
] [arXiv
] [Yannic Kilcher's video
]
Pretrained models
You can choose to download only the weights of the pretrained backbone used for downstream tasks, or the full checkpoint which contains backbone and projection head weights for both student and teacher networks. We also provide the training and evaluation logs.
The pretrained models are available on PyTorch Hub.
import torch deits16 = torch.hub.load('facebookresearch/dino:main', 'dino_deits16') deits8 = torch.hub.load('facebookresearch/dino:main', 'dino_deits8') vitb16 = torch.hub.load('facebookresearch/dino:main', 'dino_vitb16') vitb8 = torch.hub.load('facebookresearch/dino:main', 'dino_vitb8') resnet50 = torch.hub.load('facebookresearch/dino:main', 'dino_resnet50')
Training
Documentation
Please install PyTorch and download the ImageNet dataset. This codebase has been developed with python version 3.6, PyTorch version 1.7.1, CUDA 11.0 and torchvision 0.8.2. The exact arguments to reproduce the models presented in our paper can be found in the args
column of the pretrained models section. For a glimpse at the full documentation of DINO training please run:
python main_dino.py --help
Vanilla DINO training 
Run DINO with DeiT-small network on a single node with 8 GPUs for 100 epochs with the following command. Training time is 1.75 day and the resulting checkpoint should reach 69.3% on k-NN eval and ~73.8% on linear eval. We provide training and linear evaluation logs for this run to help reproducibility.
python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch deit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir
Multi-node training
We use Slurm and submitit (pip install submitit
). To train on 2 nodes with 8 GPUs each (total 16 GPUs):
python run_with_submitit.py --nodes 2 --ngpus 8 --arch deit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir
DINO with ViT-base network.
python run_with_submitit.py --nodes 2 --ngpus 8 --use_volta32 --arch vit_base --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir
Boosting DINO performance 
You can improve the performance of the vanilla run by:
- training for more epochs:
--epochs 300
, - increasing the teacher temperature:
--teacher_temp 0.07 --warmup_teacher_temp_epochs 30
. - removing last layer normalization (only safe with
--arch deit_small
):--norm_last_layer false
,
python run_with_submitit.py --arch deit_small --epochs 300 --teacher_temp 0.07 --warmup_teacher_temp_epochs 30 --norm_last_layer false --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir
The resulting pretrained model should reach 73.3% on k-NN eval and ~76.1% on linear eval. Training time is 2.6 days with 16 GPUs. We provide training and linear evaluation logs for this run to help reproducibility.
ResNet-50 and other convnets trainings
This code also works for training DINO on convolutional networks, like ResNet-50 for example. We highly recommend to adapt some optimization arguments in this case. For example following is a command to train DINO on ResNet-50 on a single node with 8 GPUs for 100 epochs. We provide training logs for this run.
python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch resnet50 --optimizer sgd --weight_decay 1e-4 --weight_decay_end 1e-4 --global_crops_scale 0.14 1 --local_crops_scale 0.05 0.14 --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir
Self-attention visualization
You can look at the self-attention of the [CLS] token on the different heads of the last layer by running:
python visualize_attention.py
Also, check out this colab for video inference.
Evaluation: k-NN classification on ImageNet
To evaluate a simple k-NN classifier with a single GPU on a pre-trained model, run:
python -m torch.distributed.launch --nproc_per_node=1 eval_knn.py --data_path /path/to/imagenet
If you choose not to specify --pretrained_weights
, then DINO reference weights are used by default. If you want instead to evaluate checkpoints from a run of your own, you can run for example:
python -m torch.distributed.launch --nproc_per_node=1 eval_knn.py --pretrained_weights /path/to/checkpoint.pth --checkpoint_key teacher --data_path /path/to/imagenet
Evaluation: Linear classification on ImageNet
To train a supervised linear classifier on frozen weights on a single node with 8 gpus, run:
python -m torch.distributed.launch --nproc_per_node=8 eval_linear.py --data_path /path/to/imagenet
License
See the LICENSE file for more details.
Citation
If you find this repository useful, please consider giving a star and citation
:
@article{caron2021emerging,
title={Emerging Properties in Self-Supervised Vision Transformers},
author={Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J\'egou, Herv\'e and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand},
journal={arXiv preprint arXiv:2104.14294},
year={2021}
}
Recommend
-
55
Whack a Dino(Game) | Made of HTML, CSS and JavaScript Whack a Dino Hey guys, in this post i am here with a Simple game that ...
-
45
README.md Adaptive Attention Span for Transformers This is a code for running experiments in Adaptive Attent...
-
11
TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provide PyTorch code for training and...
-
16
1. IntroductionI am taking the invitation of Hackernoon to have a writing contest on the topic of decentralization.0 reactionsDecentralization is a dear topic for me as I work on a decentralization proje...
-
10
SimSiam: Exploring Simple Siamese Representation Learning This is a PyTorch implementation of the SimSiam paper: @Article{chen2020simsiam, author = {Xinlei Chen an...
-
10
扫码领取奖励更多详情链小象(CFOR)未来可兑换比特币、以太坊、瑞波、EOS等区块链资产;链向财经合作区块链项目资产;链向财经应用内的增值产品和服务、链向财经主办活动的奖品。
-
9
MoCo: Momentum Contrast for Unsupervised Visual Representation Learning This is a PyTorch implementation of the MoCo paper: @Article{he2019moco, author = {Kaiming...
-
6
DE⫶TR: End-to-End Object Detection with Transformers PyTorch training code and pretrained models for DETR (DEtection TRansformer). We replace the full complex hand-c...
-
14
Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @Article{MaskedAu...
-
11
Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection Yen-Cheng...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK