19

GitHub - Tencent/NeuralNLP-NeuralClassifier: An Open-source Neural Hierarchical...

 4 years ago
source link: https://github.com/Tencent/NeuralNLP-NeuralClassifier
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

NeuralClassifier Logo

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

Introduction

NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. It is built on PyTorch. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.

Support tasks

  • Binary-class text classifcation
  • Multi-class text classification
  • Multi-label text classification
  • Hiearchical (multi-label) text classification (HMC)

Support text encoders

Requirement

  • Python 3
  • PyTorch 0.4+
  • Numpy 1.14.3+

System Architecture

NeuralClassifier Architecture

Usage

Training

python train.py conf/train.json

Detail configurations and explanations see Configuration.

The training info will be outputted in standard output and log.logger_file.

Evaluation

python eval.py conf/train.json
  • if eval.is_flat = false, hierarchical evaluation will be outputted.
  • eval.model_dir is the model to evaluate.
  • data.test_json_files is the input text file to evaluate.

The evaluation info will be outputed in eval.dir.

Input Data Format

JSON example:

{
    "doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
    "doc_token": ["I", "love", "deep", "learning"],
    "doc_keyword": ["deep learning"],
    "doc_topic": ["AI", "Machine learning"]
}

"doc_keyword" and "doc_topic" are optional.

Performance

0. Dataset

DatasetTaxonomy#Label#Training#Test RCV1Tree10323,149781,265 YelpDAG53987,37537,265

1. Compare with state-of-the-art

Text EncodersMicro-F1 on RCV1Micro-F1 on Yelp HR-DGCNN (Peng et al., 2018)0.7610- HMCN (Wehrmann et al., 2018)0.80800.6640 Ours0.83130.6704

2. Different text encoders

Text EncodersRCV1Yelp Micro-F1Macro-F1Micro-F1Macro-F1 TextCNN0.77170.52460.62810.3657 TextRNN0.81520.54580.67040.4059 RCNN0.83130.60470.65690.3951 FastText0.68870.2701 0.60310.2323 DRNN0.7846 0.51470.65790.4401 DPCNN0.8220 0.5609 0.5671 0.2393 VDCNN0.7263 0.38600.63950.4035 AttentiveConvNet0.75330.43730.63670.4040 RegionEmbedding0.7780 0.4888 0.66010.4514 Transformer0.7603 0.42740.65330.4121 Star-Transformer0.7668 0.48400.64820.3895

3. Hierarchical vs Flat

Text EncodersHierarchicalFlat Micro-F1Macro-F1Micro-F1Macro-F1 TextCNN0.77170.52460.73670.4224 TextRNN0.81520.54580.7546 0.4505 RCNN0.83130.60470.79550.5123 FastText0.68870.2701 0.68650.2816 DRNN0.7846 0.51470.75060.4450 DPCNN0.8220 0.5609 0.7423 0.4261 VDCNN0.7263 0.38600.71100.3593 AttentiveConvNet0.75330.43730.75110.4286 RegionEmbedding0.7780 0.4888 0.76400.4617 Transformer0.7603 0.42740.76020.4339 Star-Transformer0.7668 0.48400.76180.4745

Acknowledgement

Some public codes are referenced by our toolkit:

Update

  • 2019-04-29, init version

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK