# Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution (Sun, et al., CoLing 2020)

The standard approach in crowdsourcing is to have a fixed number of workers annotate each instance and then aggregate annotations in some way (possibly with experts resolving disagreements). This paper proposes a way to dynamically allocate workers.

The process is as follows:

1. Get two workers to annotate an example. If they agree, assign the label.
2. For disagreements, ask additional annotators to label it until a simple majority annotation is reached or a limit is reached.
3. For cases where the limit is reached, use some aggregation approach / experts.

I really like this idea - it’s simple to apply and the intuition for why it should work is clear. Unfortunately, the experiments in the paper do not do the comparison I am most interested in: real data, with multiple annotation strategies applied. The simulated study supports the effectiveness, but that means buying a range of assumptions about annotator behaviour (e.g. that all errors are equally likely and all workers have the same pattern of behaviour). There is a large-scale experiment with real data in which the approach collects 3.74 labels per instance on average (with a mimimum of 3) and only 5% of cases not reaching a consensus. That seems very good!

## Citation

Paper

@inproceedings{sun-etal-2020-improving,
title = "Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution",
title: "Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution",
author = "Sun, David Q.  and
Klein, Christopher  and
Gupta, Mayank  and
Li, William  and
Williams, Jason D.",
booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
month = "dec",
year = "2020",