Compositional Demographic Word Embeddings (Welch et al., EMNLP 2020)

Most work in NLP uses datasets with a diverse set of speakers. In practise, everyone speaks / writes slightly differently and our models would be better if they accounted for that. This has been the motivation for a line of work by [Charlie Welch]( that I've been a collaborator on (in [CICLing 2019](, [IEEE Intelligent Systems 2019](, [CoLing 2020](, and this paper).

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation (Welch, Mihalcea, and Kummerfeld, EMNLP 2020)

This paper explores two questions. First, what is the impact of a few key design decisions for word embeddings in language models? Second, based on the first answer, how can we improve results in the situation where we have 50 million+ words of text, but only 1 GPU for training?

An Analysis of Neural Language Modeling at Multiple Scales (Merity et al., 2018)

Assigning a probability distribution over the next word or character in a sequence (language modeling) is a useful component of many systems...

Dynamic Evaluation of Neural Sequence Models (Krause et al., 2017)

Language model perplexity can be reduced by maintaining a separate model that is updated during application of the model, allowing adaptation to short-term patterns in the text.