Compositional Demographic Word Embeddings (Welch et al., EMNLP 2020)

Most work in NLP uses datasets with a diverse set of speakers. In practise, everyone speaks / writes slightly differently and our models would be better if they accounted for that. This has been the motivation for a line of work by [Charlie Welch](http://cfwelch.com/) that I've been a collaborator on (in [CICLing 2019](https://www.jkk.name/publication/cicling19personal), [IEEE Intelligent Systems 2019](https://www.jkk.name/publication/ieee19personal/), [CoLing 2020](https://www.jkk.name/publication/coling20personal/), and this paper).

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation (Welch, Mihalcea, and Kummerfeld, EMNLP 2020)

This paper explores two questions. First, what is the impact of a few key design decisions for word embeddings in language models? Second, based on the first answer, how can we improve results in the situation where we have 50 million+ words of text, but only 1 GPU for training?

High-risk learning: acquiring new word vectors from tiny data (Herbelot et al., 2017)

The simplest way to learn word vectors for rare words is to average their context. Tweaking word2vec to make greater use of the context may do slightly better, but it's unclear.