Broadcast: Events

Distributional Models of Word Meaning and How to Evaluate Them

Friday 21 November 13:00 until 14:00

Pevensey I, 1A6.

Speaker: Miroslav Batchkarov (PhD Student, Informatics)

Part of the series: School of Engineering and Informatics - Work In Progress Seminars

Distributional models of word meaning have attracted a lot of attention in the NLP community as they do not require manually annotated data and can therefore harness the vast amounts of unlabelled text available today. Despite these advantages, such models are hard to scale beyond short phrases due to data sparsity. Recent research attempts to overcome this limitation by using the principle of compositionality, which states that the meaning of a phrase can be derived from the meaning of its constituents.

In evaluating these proposals, the most commonly used measure of performance has been the degree to which they agree with human-provided phrase similarity scores. We argue that such evaluations are unreliable due to the quality and quantity of the data sets used, and have thus failed to asses the relative usefulness of composers in a practical context. We propose a framework within which to compare composers with respect to their performance on the real-world task of document classification.

By: Luke Scott
Last updated: Wednesday, 12 November 2014