Foundations
Natural Language Inference, Understanding and Composition
Embedding reliable and interpretable linguistic representations has been a long-standing topic of interest in NLP. Fundamental to this research line are two key aspects of language: the ability to merge units of meaning – or compositionality – and drawing inferences between linguistic units. Part of our investigation of these phenomena lies at the intersection of compositional distributional semantics and representation learning. Specifically, on how different graph-based neural architectures and data-structures can be integrated and exploited to tackle phenomena like composition, sentence representation and natural language inference.
People: Lorenzo Bertolini, Julie Weeds and David Weir
Structural Information in Sentence meaning understanding
Sentential meaning composition is a crucial task for natural language understanding. Humans have little difficulty understanding previously unseen sentences, with the ability to systematically compose the meaning of the sentence’s constituents. Many existing neural models have been shown to lack this compositional ability, as demonstrated through poor performance on various composition/generalisation datasets. Our research addresses this challenge through disambiguation of sentence meaning, and improvements to models via structure supervision.
People: Qiwei Peng, David Weir and Julie Weeds
Lexical Contextualisation through Composition of Anchored Parse Trees
We have developed a new framework for compositional distributional semantics in which the distributional contexts of lexemes are expressed in terms of anchored packed dependency trees. We have shown that these structures have the potential to capture the full sentential contexts of a lexeme, provide a uniform basis for the composition of distributional knowledge in a way that captures both mutual disambiguation and generalization, and support the measurement of lexeme, phrase and sentence similarity and plausibility.
People: Jeremy Reffin, Julie Weeds and David Weir and Thomas Kober
Agile Text Classification
In this work, our goal is to develop a methodology that can be used to rapidly create bespoke, accurate text classification models. The need for large amounts of hand-labelled data can be reduced through the use of weakly labelled data, active learning, and fine-tuning of large pre-trained language models. There are many different scenarios to consider. Text classification is a broad topic, for example, the length of the texts being classified can vary considerably (from short tweets to long news articles), the nature of the classification decision can be based on quite different aspects of the texts (determining whether a text contains provocative content versus determining the overall topic of the text), and there can be substantial differences in terms of the extent to which class imbalance needs to be addressed.
People: Shaun Ring and David Weir