Parsing with Traces: An O($n^4$) Algorithm and a Structural Representation

Abstract

General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3% of the Penn English Treebank. We also implement a proofof-concept parser that recovers a range of null elements and trace types.

Publication
Transactions of the Association for Computational Linguistics
Jonathan K. Kummerfeld
Jonathan K. Kummerfeld
Postdoctoral Research Fellow

Postdoc working on Natural Language Processing and Crowdsourcing.