GitHub - Tencent/NeuralNLP-NeuralClassifier: An Open-source Neural Hierarchical...
source link: https://github.com/Tencent/NeuralNLP-NeuralClassifier
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
README.md
NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Introduction
NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. It is built on PyTorch. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.
Support tasks
- Binary-class text classifcation
- Multi-class text classification
- Multi-label text classification
- Hiearchical (multi-label) text classification (HMC)
Support text encoders
- TextCNN (Kim, 2014)
- RCNN (Lai et al., 2015)
- TextRNN (Liu et al., 2016)
- FastText (Joulin et al., 2016)
- VDCNN (Conneau et al., 2016)
- DPCNN (Johnson and Zhang, 2017)
- AttentiveConvNet (Yin and Schutze, 2017)
- DRNN (Wang, 2018)
- Region embedding (Qiao et al., 2018)
- Transformer encoder (Vaswani et al., 2017)
- Star-Transformer encoder (Guo et al., 2019)
Requirement
- Python 3
- PyTorch 0.4+
- Numpy 1.14.3+
System Architecture
Usage
Training
python train.py conf/train.json
Detail configurations and explanations see Configuration.
The training info will be outputted in standard output and log.logger_file.
Evaluation
python eval.py conf/train.json
- if eval.is_flat = false, hierarchical evaluation will be outputted.
- eval.model_dir is the model to evaluate.
- data.test_json_files is the input text file to evaluate.
The evaluation info will be outputed in eval.dir.
Input Data Format
JSON example:
{
"doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
"doc_token": ["I", "love", "deep", "learning"],
"doc_keyword": ["deep learning"],
"doc_topic": ["AI", "Machine learning"]
}
"doc_keyword" and "doc_topic" are optional.
Performance
0. Dataset
DatasetTaxonomy#Label#Training#Test RCV1Tree10323,149781,265 YelpDAG53987,37537,265- RCV1: Lewis et al., 2004
- Yelp: Yelp
1. Compare with state-of-the-art
Text EncodersMicro-F1 on RCV1Micro-F1 on Yelp HR-DGCNN (Peng et al., 2018)0.7610- HMCN (Wehrmann et al., 2018)0.80800.6640 Ours0.83130.6704- HR-DGCNN: Peng et al., 2018
- HMCN: Wehrmann et al., 2018
2. Different text encoders
Text EncodersRCV1Yelp Micro-F1Macro-F1Micro-F1Macro-F1 TextCNN0.77170.52460.62810.3657 TextRNN0.81520.54580.67040.4059 RCNN0.83130.60470.65690.3951 FastText0.68870.2701 0.60310.2323 DRNN0.7846 0.51470.65790.4401 DPCNN0.8220 0.5609 0.5671 0.2393 VDCNN0.7263 0.38600.63950.4035 AttentiveConvNet0.75330.43730.63670.4040 RegionEmbedding0.7780 0.4888 0.66010.4514 Transformer0.7603 0.42740.65330.4121 Star-Transformer0.7668 0.48400.64820.38953. Hierarchical vs Flat
Text EncodersHierarchicalFlat Micro-F1Macro-F1Micro-F1Macro-F1 TextCNN0.77170.52460.73670.4224 TextRNN0.81520.54580.7546 0.4505 RCNN0.83130.60470.79550.5123 FastText0.68870.2701 0.68650.2816 DRNN0.7846 0.51470.75060.4450 DPCNN0.8220 0.5609 0.7423 0.4261 VDCNN0.7263 0.38600.71100.3593 AttentiveConvNet0.75330.43730.75110.4286 RegionEmbedding0.7780 0.4888 0.76400.4617 Transformer0.7603 0.42740.76020.4339 Star-Transformer0.7668 0.48400.76180.4745Acknowledgement
Some public codes are referenced by our toolkit:
- https://pytorch.org/docs/stable/
- https://github.com/jadore801120/attention-is-all-you-need-pytorch/
- https://github.com/Hsuxu/FocalLoss-PyTorch
- https://github.com/Shawn1993/cnn-text-classification-pytorch
- https://github.com/ailias/Focal-Loss-implement-on-Tensorflow/
- https://github.com/brightmart/text_classification
- https://github.com/NLPLearn/QANet
- https://github.com/huggingface/pytorch-pretrained-BERT
Update
- 2019-04-29, init version
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK