Iterative Feature Mining for Constraint-Based Data Collection to Increase Data Diversity and Model Robustness (Larson, et al., EMNLP 2020)

When we crowdsource data for tasks like SRL and sentiment analysis we only care about accuracy. For tasks where workers write new content, such as paraphrasing and creating questions, we also care about data diversity. If our data is not diverse then models trained on it will not be robust in the real world. The core idea of this paper is to encourage creativity by constraining workers.

ChartDialogs: Plotting from Natural Language Instructions (Shao and Nakashole, ACL 2020)

Natural language interfaces to computer systems are an exciting area with new workshops ([WNLI](https://www.aclweb.org/anthology/volumes/2020.nli-1/) at ACL and [IntEx-SemPar](https://intex-sempar.github.io/) at EMNLP), a range of datasets (including my own work on [text-to-SQL](/publication/acl18sql/)), and many papers. Most work focuses on either (1) commands for simple APIs, (2) generating a database query, or (3) generating general purpose code. This paper considers an interesting application: interaction with data visualisation tools.

DSTC 7 track 1: Next Utterance Selection

Data from Noetic End-to-End Response Selection Challenge. Dialogue from Ubuntu tech support and Michigan course advising.

DSTC 8 track 2: Next Utterance Selection

Data from NOESIS II: Predicting Responses, Identifying Success, and Managing Complexity in Task-Oriented Dialogue. Dialogue from Ubuntu tech support and Michigan course advising.

A Large-Scale Corpus for Conversation Disentanglement (Kummerfeld et al., 2019)

This post is about my own paper to appear at ACL later this month. What is interesting about this paper will depend on your research interests, so that’s how I’ve broken down this blog post. A few key points first: Data and code are available on Github. The paper is also available.

Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time (Huang et al., 2018)

For a more flexible dialogue system, use the crowd to propose and vote on responses, then introduce agents and a model for voting, gradually learning to replace the crowd.

Frames: a corpus for adding memory to goal-oriented dialogue systems (El Asri et al., 2017)

A new dialogue dataset that has annotations of multiple plans (frames) and dialogue acts that indicate modifications to them.

Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings (He et al., 2017)

During task-oriented dialogue generation, to take into consideration a table of information about entities, represent it as a graph, run message passing to get vector representations of each entity, and use attention.

Joint Modeling of Content and Discourse Relations in Dialogues (Qin et al., 2017)

Identifying the key phrases in a dialogue at the same time as identifying the type of relations between pairs of utterances leads to substantial improvements on both tasks.