Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations (Wieting et al., 2017)

It would be convenient to have a way to represent sentences in a vector space, similar to the way vectors are frequently used to represent input words for a task. Quite a few sentence embeddings methods have been proposed, but none have really caught on. Building on prior work by the same authors, the approach here is to define a neural network that maps a sentence to a vector, then train it with a loss function that measures similarity between the vectors for paraphrases.

This paper scales up the approach, using millions of paraphrases, and explores a range of models. To get the paraphrases they use translation (start with a sentence, translate it to another language and back, then assume the translation is a paraphrase). For negative examples they use the sentence that the model currently thinks is most similar other than the correct one (choosing this from a large enough set is key).

The best model is very simple - concatenate together the average of word vectors and the average of character trigram vectors. That consistently beats prior work, including convolutional models, and LSTMs. In a way, this is nice as it is a simple way to get a sentence representation! On the other hand, this can’t possibly capture the semantics of a sentence fully since it doesn’t take word order into consideration at all.


ArXiv Paper

  author        = {{Wieting}, J. and {Gimpel}, K.},
  title         = {Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations},
  journal       = {ArXiv e-prints},
  archivePrefix = {arXiv},
  eprint        = {1711.05732},
  primaryClass  = {cs.CL},
  year          = {2017},
  month         = {November},
  url           = {https://arxiv.org/abs/1711.05732},


comments powered by Disqus