README.md

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

Introduction

NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. It is built on PyTorch. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.

Support tasks

Binary-class text classifcation
Multi-class text classification
Multi-label text classification
Hiearchical (multi-label) text classification (HMC)

Support text encoders

TextCNN (Kim, 2014)
RCNN (Lai et al., 2015)
TextRNN (Liu et al., 2016)
FastText (Joulin et al., 2016)
VDCNN (Conneau et al., 2016)
DPCNN (Johnson and Zhang, 2017)
AttentiveConvNet (Yin and Schutze, 2017)
DRNN (Wang, 2018)
Region embedding (Qiao et al., 2018)
Transformer encoder (Vaswani et al., 2017)
Star-Transformer encoder (Guo et al., 2019)

Requirement

Python 3
PyTorch 0.4+
Numpy 1.14.3+

System Architecture

Usage

Training

python train.py conf/train.json

Detail configurations and explanations see Configuration.

The training info will be outputted in standard output and log.logger_file.

Evaluation

python eval.py conf/train.json

if eval.is_flat = false, hierarchical evaluation will be outputted.
eval.model_dir is the model to evaluate.
data.test_json_files is the input text file to evaluate.

The evaluation info will be outputed in eval.dir.

Input Data Format

JSON example:

{
    "doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
    "doc_token": ["I", "love", "deep", "learning"],
    "doc_keyword": ["deep learning"],
    "doc_topic": ["AI", "Machine learning"]
}

"doc_keyword" and "doc_topic" are optional.

Performance

0. Dataset

DatasetTaxonomy#Label#Training#Test RCV1Tree10323,149781,265 YelpDAG53987,37537,265

RCV1: Lewis et al., 2004
Yelp: Yelp

1. Compare with state-of-the-art

Text EncodersMicro-F1 on RCV1Micro-F1 on Yelp HR-DGCNN (Peng et al., 2018)0.7610- HMCN (Wehrmann et al., 2018)0.80800.6640 Ours0.83130.6704

HR-DGCNN: Peng et al., 2018
HMCN: Wehrmann et al., 2018

2. Different text encoders

Text EncodersRCV1Yelp Micro-F1Macro-F1Micro-F1Macro-F1 TextCNN0.77170.52460.62810.3657 TextRNN0.81520.54580.67040.4059 RCNN0.83130.60470.65690.3951 FastText0.68870.2701 0.60310.2323 DRNN0.7846 0.51470.65790.4401 DPCNN0.8220 0.5609 0.5671 0.2393 VDCNN0.7263 0.38600.63950.4035 AttentiveConvNet0.75330.43730.63670.4040 RegionEmbedding0.7780 0.4888 0.66010.4514 Transformer0.7603 0.42740.65330.4121 Star-Transformer0.7668 0.48400.64820.3895

3. Hierarchical vs Flat

Text EncodersHierarchicalFlat Micro-F1Macro-F1Micro-F1Macro-F1 TextCNN0.77170.52460.73670.4224 TextRNN0.81520.54580.7546 0.4505 RCNN0.83130.60470.79550.5123 FastText0.68870.2701 0.68650.2816 DRNN0.7846 0.51470.75060.4450 DPCNN0.8220 0.5609 0.7423 0.4261 VDCNN0.7263 0.38600.71100.3593 AttentiveConvNet0.75330.43730.75110.4286 RegionEmbedding0.7780 0.4888 0.76400.4617 Transformer0.7603 0.42740.76020.4339 Star-Transformer0.7668 0.48400.76180.4745

Acknowledgement

Some public codes are referenced by our toolkit:

Update

2019-04-29, init version

GitHub - Tencent/NeuralNLP-NeuralClassifier: An Open-source Neural Hierarchical...

README.md

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

Introduction

Support tasks

Support text encoders

Requirement

System Architecture

Usage

Training

Evaluation

Input Data Format

Performance

0. Dataset

1. Compare with state-of-the-art

2. Different text encoders

3. Hierarchical vs Flat

Acknowledgement

Update

Recommend

GitHub - hendrycks/natural-adv-examples: A Harder ImageNet Test Set

GitHub - huashengdun/webssh: Web based ssh client

GitHub - edx/edx-platform: The Open edX platform, the software that powers edX!

GitHub - huggingface/pytorch-transformers: ? A library of state-of-the-art pretr...

GitHub - zllrunning/video-object-removal: Just draw a bounding box and you can r...

Modern text rendering with Linux: Part 1

Archive.org: Game Source Code Collection

GO学习笔记 - 命令行解析

如果这种暗物质粒子存在，人将会死于无法解释的枪伤

GitHub - Masterminds/squirrel: Fluent SQL generation for golang

About Joyk