Format conversion

This program can convert between our representation and the standard penn treebank structure. Below our representation is refered to as ‘shp’ for ‘split head parse’. The simplest way to run it is:


python -i p -o hr -e hj0 -h j1 < input_data.ptb


python -i h -o o -e he < input_data.shp

Other conversions

The program has a range of options for conversions (in each case the character in brackets is what should be used). There are six options, each with various arguments:

Note - not all of these have been thoroughly tested. The options I have used least are conll input, and output to tex, ontonotes, and dependencies.

For edits, the seven types of coordination we define (j[0-6]) are:

  1. (A , B , C and D)
  2. (A (, B (, C (and D))))
  3. (A , (B , (C and D)))
  4. (A (, B) (, C) (and D))
  5. ((((A ,) B ,) C and) D)
  6. (((A , B) , C) and D)
  7. ((A ,) (B ,) (C and) D)

For head rules, the six variants we define (j[0-5]) change the head in coordination:

  1. First non-punctuation
  2. First conjunction
  3. First non-punctuation non-conjunction
  4. Last non-punctuation
  5. Last conjunction
  6. Last non-punctuation non-conjunction