The strange geometry of skip-gram with negative sampling (Mimno et al., 2017)

It turns out that if the vectors learned by word2vec are projected into a plane they all point in the same direction. Also, the context vectors (which are part of the algorithm, but not retained afterwards) point the other way. When visualising with t-SNE this effect is not visible because of the way the space is warped to optimise the t-SNE objective.

This is surprising, and may seem problematic since it doesn’t fit our goals for what these vectors should be capturing. However, it doesn’t seem to impact downstream tasks, for example, GloVe does not have this property, and doesn’t seem to derive a great benefit from it.



  author    = {Mimno, David  and  Thompson, Laure},
  title     = {The strange geometry of skip-gram with negative sampling},
  booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
  month     = {September},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics},
  pages     = {2873--2878},
  url       = {}


comments powered by Disqus