Ehud Reiter's Blog

Ehud's thoughts and observations about Natural Language Generation

I recently went through my publications page to mark papers with 100+ citations on Google Scholar. As I did this, I realised that a lot of these papers had been quite hard to publish. In other words, I had to fight to get a lot of these papers published, more so than with most of my lower-citation-count papers. Is there a general principle here? If a paper is easily accepted, does this mean it is less likely to have a major impact?

Have other people had similar experiences, if so please let me know.

I summarise the stories behind five of the papers below

SimpleNLG: A realisation engine for practical applications (2009).
Proceedings of ENLG-2009, pages 90-93. (ACL Anthology)

This paper was originally rejected by the 2009 European NLG workshop (which is not a high-status selective event), because the reviewers thought there was nothing interesting about the simplenlg software package. I objected to the workshop organisers (the only time I have ever done this in over 25 years as an academic), on the grounds that the NLG community should know about simplenlg as a useful tool, even if in some abstract sense simplenlg was not “academically interesting”. The organisers agreed to accept the paper as a poster presentation, and it has gone to become one of the most-cited recent papers in ENLG workshops;and indeed I have noticed several papers in subsequent ENLG workshops which talk about enhancements to simplenlg (usually adapting it for another language). And simplenlg itself has become perhaps the most widely used open-source NLG package.

In 2016, I think the academic community has accepted that resources (software, corpora, etc) are a valuable contribution to the research community and papers about them are worth publishing. But this was perhaps not the case in 2009.

Choosing Words in Computer-Generated Weather Forecasts (2005).
Artificial Intelligence 167:137-169. (DOI)

This paper was submitted to a special issue of Artificial Intelligence. The reviewers of the initial submission were dubious, because they thought the evaluation wasn’t good enough. So we ran a new and much improved evaluation in the very tight window we had for revising the paper. The results were astonishing, because they showed that forecast readers preferred our computer-generated weather forecasts to meteorologist-written weather forecasts. In other words, our system was better than human! This would be a remarkable result now, and was was unheard of in 2005.

I am forever grateful to the anonymous reviewers of this paper, who forced me to do a better evaluation. This is probably paper reviewing at its best.

Lessons from a Failure: Generating Tailored Smoking Cessation Letters (2003).
Artificial Intelligence 144:41-58. (DOI)

This paper reported a negative result (ie, our evaluation clearly showed that our software was not effective). It was unheard of at the time (and is still far-too-rare) for an AI paper to report a negative result. The journal editor was sympathetic, but wasn’t sure how to best present a negative result in an AI paper. We went through numerous iterations trying to come up with something he was happy with (and I must admit that on a few occasions I felt like giving up).

I think the result was not only a good paper, but also a clear sign that negative results in AI could be published in a major AI journal, and hence perhaps an encouragement to other people to try to publish negative results. I am grateful to the editor for the effort he put into trying to find the right way to present a negative result.

Incidentally, we also published this result at the 2001 ACL conference. That year I was one of the ACL area chairs, who in those days met in person to discuss papers. Whenever an area chair’s paper came up for discussion, he or she had to leave the room. My paper was something the fifth such paper to come up, and decisions about the previous four took just a few seconds. So when I was asked to leave the room, I thought I would just hang around by the doot, expecting to get called back in right away. Instead I waited … and waited … and started wondering if I should find a cafe and buy a cup of coffee. Eventually they called me back in and gave me the good news that the paper had been accepted. But the length of time suggests that it probably triggered a pretty vigorous debate in the ACL committee.

Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions (1995).
Cognitive Science 19:233-263 (Science Direct)

This is my most-cited journal paper, with over 600 citations. It also was rejected from the first journal we submitted it to, Computational Linguistics. And quite strongly rejected as well (“this paper isnt good enough”, not “you need to do a bit more work before you publish”). So we resubmitted the paper to Cognitive Science, which accepted it. And I suspect this paper has a considerably higher citation count than most of the papers that Computational Linguistics accepted that year…

Sometimes reviewers just get it wrong. But if you really believe in a paper, you can keep trying and submit the paper elsewhere.

Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? Proc of INLG-1994, pages 163-170. (ACL Anthology)

This is my most-cited conference/workshop paper, with over 250 citations. It effectively was the paper where I first talked about the DocumentPlanning-Microplanning-Realisation “NLG Pipeline”. The paper was also rejected the first time I submitted it, which was to the 1993 European NLG workshop, on the grounds that it didnt say anything innovative and worth saying. It certainly was an unusual paper, because it mostly surveyed other people’s work on applied NLG, and argued that there was a lot of architectural similarity in how these systems were built, if you ignored terminology and names.

I understand why the ENLG reviewers rejected the paper, but I think there is a need for this kind of paper, and workshops in particular should be broad-minded rather than narrow-minded in what they accept. Certainly this paper had a far greater citation count, and impact, then the papers which were accepted for ENLG 1993.

Summary

Three of these papers struggled because they made an unusual contribution: a negative result, a software resource, and a survey of other people’s work. I like to think that the academic community is more accepting of such papers in 2016, but they certainly remain pretty unusual, and I suspect its still a lot harder to get such papers published compared to more conventional papers.

And sometimes reviewers just get it wrong. This is inevitable in the real world, but I suspect its more common with unusual papers.

And sometimes reviewers do an absolutely superb job and guide/force you to “do the job right”!

Good Papers are Hard to Publish

Ehud Reiter's Blog

Ehud's thoughts and observations about Natural Language Generation

Related

Recommend

Swap on HDD: Does placement matter? (tl;dr: Yes)

OpenSSL version 3.0.0 published

虚拟货币交易所的交易流量成本问题

写完博客关机，吃卤味然后洗澡

原以为很简单，结果这道 Promise 面试题让我失眠好一会

为什么 Windows 命令行工具很难用？新出的 terminal 也一样

aws s3 国内的使用体验如何？

“顶流”折戟，新茶饮“第二股”之争悄悄下沉了？

KivyMD (@KivyMD) / Twitter

阿里之于 Java，腾讯之于 C++，两个巨头直接影响了两个城市的 IT 编程语言发展

About Joyk