33

Microsoft’s New MT-DNN Outperforms Google BERT

 5 years ago
source link: https://www.tuicool.com/articles/hit/feqAFfr
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Multi-task learning and language model pre-training are popular approaches for many of today’s natural language understanding (NLU) tasks. Now, Microsoft researchers have released technical details of an AI system that combines both approaches. The new Multi-Task Deep Neural Network (MT-DNN) is a natural language processing (NLP) model that outperforms Google BERT in nine of eleven benchmark NLP tasks.

In their paper Multi Task Deep Neural Networks for Natural Language Understanding, the Microsoft Research and Microsoft Dynamics 365 authors show MT-DNN learning representations across multiple natural language understanding (NLU) tasks. The model “not only leverages large amounts of cross-task data, but also benefits from a regularization effect that leads to more general representations to help adapt to new tasks and domains.”

MT-DNN builds on a model Microsoft proposed in 2015 and integrates the network architecture of BERT, a pre-trained bidirectional transformer language model proposed by Google last year.

umyiqmq.png!webjE3iYjv.png!web

As shown in the figure above, the network’s low-level layers (i.e., text encoding layers) are shared across all tasks, while the top layers are task-specific, combining different types of NLU tasks. Like the BERT model, MT-DNN is trained in two phases: pre-training and fine-tuning. But unlike BERT, MT-DNN adds multi-task learning (MTL) in the fine-tuning phases with multiple task-specific layers in its model architecture.

MT-DNN achieved new SOTA results on ten NLU tasks, including SNLI, SciTail; and eight out of nine GLUE tasks, elevating the GLUE benchmark to 82.2% (1.8% absolute improvement). Researchers also demonstrate that using the SNLI and SciTail datasets, representations learned by MT-DNN allow domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.

For more details, please find the paper on arXiv. Microsoft will release the code and pre-trained models.

The GLUE Benchmark leaderboard is here .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK