This repository contains software to do several things related to syntax - parsing, format conversion, and evaluation. For a full definition of the parsing algorithm, proofs of its properties, and results on the standard metrics, see:

If you want to understand the algorithm, my thesis provides a better explanation (the TACL paper focuses on proving properties):

If you use this code in your own work, please cite the TACL paper:

  author    = {Jonathan K. Kummerfeld and Dan Klein},
  title     = {Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation},
  journal   = {Transactions of the Association for Computational Linguistics},
  volume    = {5},
  year      = {2017},
  pages     = {},
  url       = {},
  software  = {},

What is this work about?

In the Penn Treebank, parses have (1) a core projective tree structure and (2) traces that represent control structures, wh-movement and more. However, most parsers and the standard evaluation metric (evalb) ignore the traces and all null elements, focusing entirely on the tree structure. These aspects of syntax are excluded not because of disagreements regarding theory, but rather because of the computational challenge of including them.

This work is about a new inference algorithm that efficiently finds the maximum scoring graph parse for a sentence when using a first-order model. Our approach involves (1) the algorithm, which builds upon the non-projective tree parsing algorithm of Pitler et al. (2013), and (2) a syntactic representation, similar to several prior approaches, particularly Carreras et al. (2008). For more details, see the paper and dissertation above.

This repository contains:

All of the scripts are in python and can be run without downloading or installing any additional resources (note, all should work with python 2.7, only some work with python 3). The parser is in scala, but can be downloaded as a jar that runs with java. For pruning we use a tagger written in C++ that depends on several libraries (we provide a makefile and information about what else needs to be installed).

For detailed information on each component, see the file in each directory.

Reproducing the results in the paper

You will need:

Assuming you download the contents of this repo and have all of those file and put them in the same directory as this file, you will need to:

  1. Add IDs to the sentences
  2. [Only if running the tagger] Simplify the text
  3. [Only if running the tagger] Run the spine tagger
  4. Run the parser
  5. Fix parses where we failed to produce output
  6. Convert to standard PTB
  7. Evaluate

Each of these steps is one command below:

python3 parser/nn-tagger/ 200001 < test.tok > test-with-ids.tok
python3 parser/nn-tagger/ < test-with-ids.tok > test-with-ids.tok.simple
parser/nn-tagger/spine-tagger -word-dict dict.words -tag-dict dict.tags -model model.params -test test.tok -prefix wsj23.tagged
./ test.parser test-with-ids.tok wsj23.pos.stanford
python3 evaluation/ < test.parser.auto_all.localStageFinal > test.parser.shp
python format-conversion/ -i h -o o -e he < test.parser.shp > test.parser.ptb
python evaluation/ test.parser.ptb | tail -n 1
python evaluation/ test.parser.ptb --null_only | tail -n 1

The final two commands give performance on (1) traces and other null items, (2) nulls only:

all 2619 3893 3524 74.3189557321 67.2745954277 70.6215450991
all 3153 3864 3522 89.5229982964 81.599378882 85.3777416734

These numbers are (count of matching, gold total, test total, precision, recall, f-score).


If you find bugs or have questions about any of this software please either contact me ( or create an issue. Thank you!

Misc Notes

The dynamic program definitions in the TACL paper and my thesis are slightly different: