

GitHub - microsoft/SimMIM: This is an official implementation for "SimMIM:...
source link: https://github.com/microsoft/SimMIM
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

SimMIM
By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*.
This repo is the official implementation of "SimMIM: A Simple Framework for Masked Image Modeling".
Updates
12/09/2021
Initial commits:
- Pre-trained and fine-tuned models on ImageNet-1K (
Swin Base
,Swin Large
, andViT Base
) are provided. - The supported code for ImageNet-1K pre-training and fine-tuneing is provided.
Introduction
SimMIM is initially described in arxiv, which serves as a simple framework for masked image modeling. From systematically study, we find that simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a strong pre-text task; 2) predicting raw pixels of RGB values by direct regression performs no worse than the patch classification approaches with complex designs; 3) the prediction head can be as light as a linear layer, with no worse performance than heavier ones.
Main Results on ImageNet
Swin Transformer
ImageNet-1K Pre-trained and Fine-tuned Models
name pre-train epochs pre-train resolution fine-tune resolution acc@1 pre-trained model fine-tuned model
Swin-Base 100 192x192 192x192 82.8 google/config google/config
Swin-Base 100 192x192 224x224 83.5 google/config google/config
Swin-Base 800 192x192 224x224 84.0 google/config google/config
Swin-Large 800 192x192 224x224 85.4 google/config google/config
SwinV2-Huge 800 192x192 224x224 85.7 / /
SwinV2-Huge 800 192x192 512x512 87.1 / /
Vision Transformer
ImageNet-1K Pre-trained and Fine-tuned Models
name pre-train epochs pre-train resolution fine-tune resolution acc@1 pre-trained model fine-tuned model
ViT-Base 800 224x224 224x224 83.8 google/config google/config
Citing SimMIM
@article{xie2021simmim,
title={SimMIM: A Simple Framework for Masked Image Modeling},
author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},
journal={arXiv preprint arXiv:2111.09886},
year={2021}
}
Getting Started
Installation
-
Install
CUDA 11.3
withcuDNN 8
following the official installation guide of CUDA and cuDNN. -
Setup conda environment:
# Create environment conda create -n SimMIM python=3.8 -y conda activate SimMIM # Install requirements conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y # Install apex git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ cd .. # Clone SimMIM git clone https://github.com/microsoft/SimMIM cd SimMIM # Install other requirements pip install -r requirements.txt
Evaluating provided models
To evaluate a provided model on ImageNet validation set, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_finetune.py \ --eval --cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>
For example, to evaluate the Swin Base
model on a single GPU, run:
python -m torch.distributed.launch --nproc_per_node 1 main_finetune.py \ --eval --cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --resume simmim_finetune__swin_base__img224_window7__800ep.pth --data-path <imagenet-path>
Pre-training with SimMIM
To pre-train models with SimMIM
, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_simmim.py \ --cfg <config-file> --data-path <imagenet-path>/train [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
For example, to pre-train Swin Base
for 800 epochs on one DGX-2 server, run:
python -m torch.distributed.launch --nproc_per_node 16 main_simmim.py \ --cfg configs/swin_base__800ep/simmim_pretrain__swin_base__img192_window6__800ep.yaml --batch-size 128 --data-path <imagenet-path>/train [--output <output-directory> --tag <job-tag>]
Fine-tuning pre-trained models
To fine-tune models pre-trained by SimMIM
, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_finetune.py \ --cfg <config-file> --data-path <imagenet-path> --pretrained <pretrained-ckpt> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
For example, to fine-tune Swin Base
pre-trained by SimMIM
on one DGX-2 server, run:
python -m torch.distributed.launch --nproc_per_node 16 main_finetune.py \ --cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --batch-size 128 --data-path <imagenet-path> --pretrained <pretrained-ckpt> [--output <output-directory> --tag <job-tag>]
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Recommend
-
90
hamcrest-php - PHP Hamcrest implementation [Official]
-
40
README.md Nuls 2.0 Welcome to Nuls! [TOC] Introduction NULS is a blockchain infrastructure that provides custom...
-
53
README.md JSON RPC server INB GO Official golang implementation of the Insi...
-
31
StyleGAN2-ADA — Official PyTorch implementation Training Generative Adversarial Networks with Limited Data Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
-
125
Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*,
-
13
MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis Official PyTorch Implementation The accompanying videos can be found on
-
8
go-algorand Algorand's official implementation in Go. Algorand is a permissionless, pure proof-of-stake blockchain that delivers decentralization, scalability, security, and transaction finality. Getting Started
-
11
BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Official PyTorch implementation of the NeurIPS 2021 paper Mingcong Liu,
-
119
HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing Yuval Alaluf*, Omer Tov*, Ron Mokady, Rinon Gal, Amit H. Bermano *Denotes equal contribution The inversion of real images into StyleGA...
-
5
LoRA: Low-Rank Adaptation of Large Language Models (For the radio communication technique, see LoRa.) This repo contains the source code of the Python package ...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK