academics

Challenging NLG datasets and tasks

Several times over the past few months I’ve been a bit annoyed at papers which use heavy-weight deep learning technology to tackle a fairly easy NLG task such as E2E (generating short sentences which summarise features of a restaurant). I should say that I have a huge respect for the 2017 E2E challenge! It was a milestone in neural NLG which highlighted and explored many key issues such as hallucination. But from the perspective of 2021, I wish people interested in neural NLG would focus on tasks and datasets which are more challenging for rule and template based approaches, in order to show that neural approaches “add value.” This is hard to do with E2E, since we can build a decent rule-based E2E system in a day using a tool such as Arria NLG Studio or indeed just writing Python code.

In other words, if I put on my “commercial” hat, I can imagine a discussion as follows with a client who wants an E2E system

Mr Smith, there are two ways we can build your NLG system:

Rule-based: It will take us a day to build the system, plus another few days for quality assurance, integration, and deployment. The system should always produce decent-quality 100% accurate texts. If it doesnt, file a bug report and we can easily fix the system. The system is also easy to change if you want to tweak its language or behaviour.
Neural: We can build the model in an hour, but it will probably take a few days to clean and prepare the data (several weeks if we have to ask humans to write training texts). Plus a few days for quality assurance, integration, and deployment. The system will produce some really nice texts, but unfortunately it will also sometimes produce low-quality or inaccurate texts. Fixing bugs and modifying/tweaking behaviour or language will be difficult.

I can tell you when presented with the above choice, 99% of the time Mr Smith will opt for the rule-based system! So I would like to see neural NLG researchers focusing on tasks and datasets which are harder (or impossible) to do with rules and templates.

Below are some suggestions for more challenging datasets and tasks. I focus on tasks which I have encountered in a commercial context, because I have a better understanding of whats involved in these. For example I wont discuss WebNLG and ToTTo, since these are very different from any commercial projects I have worked on.

Weather

Generating weather forecasts is one of the oldest applications of NLG. It is possible to build very good rule-based NLG systems to generate weather forecast; indeed in an evaluation forecast readers preferred texts generated by our SumTime forecast generator over texts written by human forecasters. However, the effort required is non-trivial (person-months of effort, hundreds or thousands of rules and templates), especially if different types of forecasts are required, or if forecasts must be tailored for individual users or dialogue contexts. So while this task can be done by rule-based systems, neural approaches could make sense if they reduced development time and effort, while still producing excellent forecasts. Facebook has developed a neural weather forecast generator which may be deployed.

In terms of datasets, Facebook has released its dataset; note its texts were written by annotators specifically to train neural NLG models, they are not “naturally occurring” weather forecasts. People who want to train on actual weather forecasts written by human forecasters are welcome to use the dataset from our SumTime project. But please do NOT use the “weathergov” dataset, since its texts are the output of a template based system.

Automatic journalism

There is a lot of interest in using NLG to produce news and sports stories. To take one small example, the BBC used Arria Studio to generate election reports. Machine learning techniques can also be used her; indeed the first commercial application of ML-based NLG that I am aware of is Kondadadi et al 2013, who generated short financial news stories.

In any case, rule-based NLG systems do reasonably well at generating news stories, However, they often lack flexibility, ie the ability to adapt the structure and content of the narrative based on the specific circumstances of the story. If neural systems could do this well while still producing accurate and readable stories, they could have advantages over rule-based NLG systems. This requires being able to generates readable narratives which are hundreds of words long and contain no hallucinations.

Probably the best known journalism dataset in NLG is the Rotowire dataset of basketball summaries. If people are interested in this, I recommend they look at Craig Thomson’s SportSett dataset, which fixes many of the problems in the original Rotowire dataset.

Discharge summaries

I was trying to think of a really challenging dataset and task which is hard to do with rule-based NLG and has arisen in commercial discussions, and one possibility is generating discharge summaries, which summarise what happened to a patient during a hospital stay. These are difficult for rule-based systems because they need to summarise an enormous range of potential clinical data and interventions, for patients with extremely diverse problems, in a short narrative which summarises a hospital stay which lasts days or weeks. Also the clinical data is noisy and human-written discharge summaries may contain abbreviated language and indeed in some cases are incorrect. So if neural NLG can reliably produce high-quality discharge summaries from clinical data, I will be impressed!

One potential data set is MIMIC (https://mimic.physionet.org/). I have never used this myself, but I believe it contains both clinical data and human-written discharge summaries.

Challenging NLG datasets and tasks – Ehud Reiter's Blog

Challenging NLG datasets and tasks

Weather

Automatic journalism

Discharge summaries

Recommend

抖品牌成长的6个生存模型（1.1万字长文）

JDC丨京东设计中心 – 8个常见的研究者认知偏误陷阱

打造一个“优秀的私域社群”，需要哪些运营步骤？完整版建议收藏

美团拟配售1.98亿股及发行可转债，合计筹资近100亿美元

Moving Bitcoin apps into the physical world The Bitcoin Bridge talks IoT with Dr...

广告行话大全（总监撰写版）

“颜值经济”持续驱动，国内最大美妆电商赴港上市，热潮还能涌多久？

阿里云盘分享和微信输入法功能内测中，百度SEO新闻源升级

思迈特软件Smartbi发展再提速，完成B+轮过亿战略融资

个人的工具类网站：https://qetool.com

About Joyk