Semantic Parsing with Semi-Supervised Sequential Autoencoders (Kocisky et al., EMNLP 2016)

By training a parser and language generation system together, we can use semantic parses without associated sentences for training (the sentence becomes a latent representation that is being learnt).

Semantic parsing datasets are small because they are expensive to produce (logical forms don’t occur naturally and writing them down takes time). The idea here is to do semi-supervised learning by implementing both a parser and a generator, which are trained together as a form of autoencoder where the intermediate representation is natural language.

The architecture has four LSTMs:

  1. Bidirectional LSTM over a logical form.
  2. One directional LSTM attending to the first LSTM’s hidden states, generating a sentence.
  3. Bidirectional LSTM over the sentence generated by the second LSTM.
  4. One directional LSTM attending to the third LSTM’s hidden states, generating a logical form.

Usually a component like the second LSTM would choose the max word at each position (or use beam search), but here they want this whole thing to be differentiable, so the distribution over words is used. At evaluation time only the second half (3+4) is used, with the test sentence as input.

With this structure, a loss function is defined that compares the input to (1) and the output of (4), which in both cases is a logical form. As a result, they don’t need (logical form, sentence) pairs to train, they can use automatically generated logical forms. Of course, with only logical forms it would do something random with the intermediate representation, so some supervised examples are also needed (in which case the two halves are trained independently).

The results are not state-of-the-art, but good on all three tasks (Geoquery, NLmaps, SAIL), and on two they show am improvement over training (3+4) with only supervised data. Varying the amount of training data gives a less clear picture. On Geoquery with 5-25% of the data, this approach clearly helps, particularly if the queries are real rather than generated (which is a realistic scenario), but then there is no improvement for 50% or 75%, and at 100% the improvement is small. On NLmaps there was no generator, and the differences at different data %s seem like noise. SAIL has the most clear benefit, though it’s a particularly small dataset, consisting of paths in just four maps.

This is a cool idea that seems effective in certain situations. The generator is key, and it’s possible that performance on GeoQuery would be higher with a more sophisticated one (e.g. a tree structured generator, rather than the ngram model used here). One idea mentioned in the conclusion is to try reversing the setup (3-4-1-2) and training with natural language examples that have no logical form. How to tradeoff the different data scenarios seems like an interesting challenge!



author    = {Ko\v{c}isk\'{y}, Tom\'{a}\v{s}  and  Melis, G\'{a}bor  and  Grefenstette, Edward  and  Dyer, Chris  and  Ling, Wang  and  Blunsom, Phil  and  Hermann, Karl Moritz},
title     = {Semantic Parsing with Semi-Supervised Sequential Autoencoders},
title: = {Semantic Parsing with Semi-Supervised Sequential Autoencoders},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month     = {November},
year      = {2016},
address   = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages     = {1078--1087},
url       = {}