The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models (Weber et al., 2018)

We know that training a neural network involves optimising over a non-convex space, but using standard evaluation methods we see that our models…

A causal framework for explaining the predictions of black-box sequence-to-sequence models (Alvarez-Melis et al., 2017)

To explain structured outputs in terms of which inputs have most impact, treat it as identifying components in a bipartite graph where weights are determined by perturbing the input and observing the impact on outputs.