Classification of Verb Particle Constructions with the Google Web1T Corpus


Manually maintaining comprehensive databases of multi-word expressions, for example Verb-Particle Constructions (VPCs), is infeasible. We describe a new classifier for potential VPCs, which uses information in the Google Web1T corpus to perform a simple linguistic constituency test. Specifically, we consider the fronting test, comparing the frequencies of the two possible orderings of the given verb and particle. Using only a small set of queries for each verb-particle pair, the system was able to achieve an F-score of 78.4% in our evaluation while processing thousands of queries a second.

Proceedings of the Australasian Language Technology Association Workshop 2008
Jonathan K. Kummerfeld
Jonathan K. Kummerfeld
Postdoctoral Research Fellow

Postdoc working on Natural Language Processing.