Uncategorized

Response to Goldberg’s Blog on Deep Learning for NLG

I wrote a comment in response to Yoav Goldberg’s An Adversarial Review of “Adversarial Generation of Natural Language” on Medium, which essentially critiques some research using deep learning in NLG, focusing on papers published in less prestigious venues. Looks like quite a few people read this comment, so I am reposting my comment on this blog.

**********************************************************************************

I saw that Mike White mentioned my name, so I thought I would comment directly. A lot of the discussion is about papers published in second-tier venues, but from my perspective there are also major problems with DL NLG papers published in top venues. Perhaps less drastic, but its a question of degree.

This was brought home to me last year when I attended NAACL 2016 (in order to give an invited talk on NLG evaluation), which was the first time I had been to an ACL event in several years. I went to listen to a NAACL paper about using DL for NLG, and was absolutely horrified.

(1) The evaluation was weak, because the authors just used BLEU, which is a questionable way to evaluate NLG systems (https://ehudreiter.com/2017/05/03/metrics-nlg-evaluation/)

(2) One of the main training corpora used was the output of a rule-based NLG system (https://ehudreiter.com/2017/05/09/weathergov/). So were the authors trying to show that they could use DL to reverse engineer a rule-based system and steal the IP of someone who spent a lot of time carefully hand-crafting NLG rules?

(3) The presenting author was completely unaware of previous work in the NLG community on the problems he was solving (this was apparent in the Q&A session as well as in the paper). He claimed his system was better than state-of-the-art, but to me his output texts looked considerably worse than stuff we were producing 15 years ago.

I am willing to be convinced that DL is a good approach for NLG, but I need to see experiments and papers with solid evaluation, sensible and appropriate corpora, and good awareness of NLG state-of-the-art. Papers like the above NAACL one dont leave me with a good impression of DL for NLG.

I’d also like someone to explain to me how we can evaluate the worst-case (as well as the average case) performance of DL systems, because this is really important (https://ehudreiter.com/2016/12/12/nlg-and-ml/).

Finally, to echo some of the other opinions which people have expressed, there is a caricature of a DL (or indeed ML) NLP researcher as someone who just wants some corpora and a way to keep score, and has no interest whether the “score” means anything and also no interest in the provenance or suitability of the corpora. I realise this is a caricature, but I think it has some truth, and I dont think this is the right attitude for making progress in NLP.

Response to Goldberg’s Blog on Deep Learning for NLG – Ehud Reiter's Blog

Response to Goldberg’s Blog on Deep Learning for NLG

Like this:

Recommend

National Cancer Institute

7156.4MHz！HyperX骇客神条HWBOT超频榜单世界第一

tss:关于 Geronimo的 GBeans 的介绍

Taiwan's TSMC plans $100 billion investment to meet demand

基于平移算法的一个小演示

DeFiBox宣布上线安审专区

新零售时代天九老板云助力无人售货机项目成新风口

Green hydrogen: Transportation in the natural gas grid

苹果前高管推出免费铸造NFT应用S!NG

CEO of Google's self-driving car spinoff steps down from job

About Joyk