When we crowdsource data for tasks like SRL and sentiment analysis we only care about accuracy. For tasks where workers write new content, such as paraphrasing and creating questions, we also care about data diversity. If our data is not diverse then models trained on it will not be robust in the real world. The core idea of this paper is to encourage creativity by constraining workers.
Natural language interfaces to computer systems are an exciting area with new workshops ([WNLI](https://www.aclweb.org/anthology/volumes/2020.nli-1/) at ACL and [IntEx-SemPar](https://intex-sempar.github.io/) at EMNLP), a range of datasets (including my own work on [text-to-SQL](/publication/acl18sql/)), and many papers. Most work focuses on either (1) commands for simple APIs, (2) generating a database query, or (3) generating general purpose code. This paper considers an interesting application: interaction with data visualisation tools.
Data from Noetic End-to-End Response Selection Challenge. Dialogue from Ubuntu tech support and Michigan course advising.
Data from NOESIS II: Predicting Responses, Identifying Success, and Managing Complexity in Task-Oriented Dialogue. Dialogue from Ubuntu tech support and Michigan course advising.
This post is about my own paper to appear at ACL later this month. What is interesting about this paper will depend on your research interests, so that’s how I’ve broken down this blog post.
A few key points first:
Data and code are available on Github. The paper is also available.
For a more flexible dialogue system, use the crowd to propose and vote on responses, then introduce agents and a model for voting, gradually learning to replace the crowd.
A new dialogue dataset that has annotations of multiple plans (frames) and dialogue acts that indicate modifications to them.
During task-oriented dialogue generation, to take into consideration a table of information about entities, represent it as a graph, run message passing to get vector representations of each entity, and use attention.
Identifying the key phrases in a dialogue at the same time as identifying the type of relations between pairs of utterances leads to substantial improvements on both tasks.