Word Vectors

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning (Tsvetkov et al., 2016)

Reordering training sentences for word vectors may impact their usefulness for downstream tasks.

A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings (Yang et al., 2017)

To leverage out-of-domain data, learn multiple sets of word vectors but with a loss term that encourages them to be similar.

The strange geometry of skip-gram with negative sampling (Mimno et al., 2017)

Surprisingly, word2vec (negative skipgram sampling) produces vectors that point in a consistent direction, a pattern not seen in GloVe (but also one that doesn’t seem to cause a problem for downstream tasks).

SPINE: SParse Interpretable Neural Embeddings (Subramanian et al., 2017)

By introducing a new loss that encourages sparsity, an auto-encoder can be used to go from existing word vectors to new ones that are sparser and more interpretable, though the impact on downstream tasks is mixed.

Learning Distributed Representations of Texts and Entities from Knowledge Base (Yamada et al., 2017)

Vectors for words and entities can be learned by trying to model the text written about the entities. This leads to word vectors that score well on similarity tasks and entity vectors that produce excellent results on entity linking and question answering.

Multimodal Word Distributions (Athiwaratkun and Wilson, 2017)

By switching from representing words as points in a vector space to multiple gaussian regions we can get a better model, scoring higher on multiple word similarity metrics than a range of techniques.

A Factored Neural Network Model for Characterizing Online Discussions in Vector Space (Cheng et al., EMNLP 2017)

A proposal for how to improve vector representations of sentences by using attention over (1) fixed vectors, and (2) a context sentence.