

GitHub - MhLiao/DB: A PyToch implementation of "Real-time Scene Text Detect...
source link: https://github.com/MhLiao/DB
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md
Introduction
This is a PyToch implementation of "Real-time Scene Text Detection with Differentiable Binarization". This paper presents a real-time arbitrary-shape scene text detector, achieving the state-of-the-art performance on standard benchmarks.
Part of the code is inherited from MegReader.
ToDo List
- Release code
- Document for Installation
- Trained models
- Document for testing and training
- Evaluation
- Demo script
- More models on more datasets
Installation
Requirements:
- Python3
- PyTorch >= 1.2
- GCC >= 4.9 (This is important for PyTorch)
- CUDA >= 9.0 (10.1 is recommended)
# first, make sure that your conda is setup properly with the right environment # for that, check that `which conda`, `which pip` and `which python` points to the # right path. From a clean conda env, this is what you need to do conda create --name DB -y conda activate DB # this installs the right pip and dependencies for the fresh python conda install ipython pip # python dependencies pip install -r requirement.txt # install PyTorch with cuda-10.1 conda install pytorch torchvision cudatoolkit=10.1 -c pytorch # clone repo git clone https://github.com/MhLiao/DB.git cd DB/ # build deformable convolution opertor cd assets/ops/dcn/ python setup.py build_ext --inplace
Models
Download Trained models Baidu Drive (download code: p6u3), Google Drive.
pre-trained-model-synthtext -- used to finetune models, not for evaluation
td500_resnet18
td500_resnet50
totaltext_resnet18
totaltext_resnet50
Datasets
The root of the dataset directory can be DB/datasets/
.
Download the converted ground-truth and data list Baidu Drive (download code: mz0a), Google Drive. The images of each dataset can be obtained from their official website.
Testing
Prepar dataset
An example of the path of test images:
datasets/total_text/train_images
datasets/total_text/train_gts
datasets/total_text/train_list.txt
datasets/total_text/test_images
datasets/total_text/test_gts
datasets/total_text/test_list.txt
The data root directory and the data list file can be defined in base_totaltext.yaml
Config file
The YAML files with the name of base*.yaml
should not be used as the training or testing config file directly.
Evaluate the performance
Note that we do not provide all the protocols for all benchmarks for simplification. The embedded evaluation protocol in the code is modified from the protocol of ICDAR 2015 dataset while support arbitrary-shape polygons. It almost produces the same results as the pascal evaluation protocol in Total-Text dataset.
python eval.py experiments/seg_detector/totaltext_resnet18_deform_thre.yaml --resume path-to-model-directory/totaltext_resnet18 --polygon --box_thresh 0.6
box_thresh
can be used to balance the precision and recall, which may be different for different datasets to get a good F-measure. polygon
is only used for arbitrary-shape text dataset. The size of the input images are defined in validate_data->processes->AugmentDetectionData
in base_*.yaml
.
Evaluate the speed
Set adaptive
to False
in the yaml file to speedup the inference without decreasing the performance. The speed is evaluated by performing a testing image for 50 times to exclude extra IO time.
python eval.py experiments/seg_detector/totaltext_resnet18_deform_thre.yaml --resume path-to-model-directory/totaltext_resnet18 --polygon --box_thresh 0.6 --speed
Note that the speed is related to both to the GPU and the CPU since the model runs with the GPU and the post-processing algorithm runs with the CPU.
Training
Check the paths of data_dir and data_list in the base_*.yaml file. For better performance, you can first per-train the model with SynthText and then fine-tune it with the specific real-world dataset.
python train.py path-to-yaml-file --num_gpus 4
You can also try distributed training (not fully tested)
python -m torch.distributed.launch --nproc_per_node=4 train.py path-to-yaml-file --num_gpus 4
Improvements
Note that the current implementation is written by pure Python code except for the deformable convolution operator. Thus, the code can be further optimized by some optimization skills, such as TensorRT for the model forward and efficient C++ code for the post-processing function.
Another option to increase speed is to run the model forward and the post-processing algorithm in parallel through a producer-consumer strategy.
Contributions or pull requests are welcome.
Citing the related works
Please cite the related works in your publications if it helps your research:
@inproceedings{liao2020real,
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
title={Real-time Scene Text Detection with Differentiable Binarization},
booktitle={Proc. AAAI},
year={2020}
}
Recommend
-
38
SceneKit is a high-level framework for adding 3d graphics to your application. In this tutorial the primitive objects will be positioned in 3d-coordinates and each primitive is assigned a color. This tutorial is made with...
-
57
Encapsulation in JavaScript
-
17
Natural Scene Recognition Using Deep Learning In Computer vision Scene Recognition is one of the top challenging research fields. Recognizing the environment in one glance is one of the...
-
13
Joost van Schaik Senior Mixed Real...
-
11
BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Official PyTorch implementation of the NeurIPS 2021 paper Mingcong Liu,
-
7
Retro Synthwave live version alternative link About This is a project I've been working on for a whi...
-
119
HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing Yuval Alaluf*, Omer Tov*, Ron Mokady, Rinon Gal, Amit H. Bermano *Denotes equal contribution The inversion of real images into StyleGA...
-
6
SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin,
-
5
OpenAI says "AI classifier" tool can detect AI-written text A beta service which is "very unreliable" at doing its alleged job By
-
5
LoRA: Low-Rank Adaptation of Large Language Models (For the radio communication technique, see LoRa.) This repo contains the source code of the Python package ...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK