A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings (Yang et al., 2017)

To construct word vectors from multi-domain data, use a separate vector for each domain and add a loss term to encourage them to agree. Here the loss is an l2 norm, weighted by a factor that depends on the frequency of the words in the two domains. The factor is the harmonic mean of the normalised frequency in each domain (so the lower frequency dominates the factor, pulling it lower). Across a range of tasks this consistently performs better than other approaches.

Citation

Paper

@InProceedings{yang-lu-zheng:2017:EMNLP2017,
  author    = {Yang, Wei  and  Lu, Wei  and  Zheng, Vincent},
  title     = {A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings},
  booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
  month     = {September},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics},
  pages     = {2898--2904},
  url       = {https://www.aclweb.org/anthology/D17-1312}
}

Related

comments powered by Disqus