The strange geometry of skip-gram with negative sampling (Mimno et al., 2017)

Surprisingly, word2vec (negative skipgram sampling) produces vectors that point in a consistent direction, a pattern not seen in GloVe (but also one that doesn’t seem to cause a problem for downstream tasks).

Sequence Effects in Crowdsourced Annotations (Mathur et al., 2017)

Annotator sequence bias, where the label for one item affects the label for the next, occurs across a range of datasets. Avoid it by separately randomise the order of items for each annotator.

High-risk learning: acquiring new word vectors from tiny data (Herbelot et al., 2017)

The simplest way to learn word vectors for rare words is to average their context. Tweaking word2vec to make greater use of the context may do slightly better, but it’s unclear.

Revisiting Selectional Preferences for Coreference Resolution (Heinzerling et al., 2017)

It seems intuitive that a coreference system could benefit from information about what nouns a verb selects for, but experiments on explicitly adding a representation of it to a neural system does not lead to gains, implying it is already learning them or they are not useful.

Neural Semantic Parsing over Multiple Knowledge-bases (Herzig et al., 2017)

Training a single parser on multiple domains can improve performance, and sharing more parameters (encoder and decoder as opposed to just one) seems to help more.

A causal framework for explaining the predictions of black-box sequence-to-sequence models (Alvarez-Melis et al., 2017)

To explain structured outputs in terms of which inputs have most impact, treat it as identifying components in a bipartite graph where weights are determined by perturbing the input and observing the impact on outputs.

A Local Detection Approach for Named Entity Recognition and Mention Detection (Xu et al., 2017)

Effective NER can be achieved without sequence prediction using a feedforward network that labels every span with a fixed attention mechanism for getting contextual information.

Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme (Zheng et al., 2017)

By encoding the relation type and role of each word in tags, an LSTM can be applied to relation extraction with great success.

Abstractive Document Summarization with a Graph-Based Attentional Neural Model (Tan et al., 2017)

Neural abstractive summarisation can be dramatically improved with a beam search that favours output that matches the source document, and further improved with attention based on PageRank, with a modification to avoid attending to the same sentence more than once.

SPINE: SParse Interpretable Neural Embeddings (Subramanian et al., 2017)

By introducing a new loss that encourages sparsity, an auto-encoder can be used to go from existing word vectors to new ones that are sparser and more interpretable, though the impact on downstream tasks is mixed.