Data Science Research Group

Seminars archive

Aleksander Savkov et al. (Babylon Health)16 Aug 2018
Deep Embedder: A Journey from an Average Word Vector to an RNN

In this talk I will describe a project from industry where we learn a strong sentence embedding model from unlabelled data.  

Thomas Kober (Edinburgh)25 July 2018
Tense and Aspect in Distributional Semantic Vector Space Models

Tense and aspect are two of the most important contributing factors to the meaning of a verb, e.g. determining the temporal extent described by a predication as well as the entailments the verb gives rise to. For example while a verb phrase describing an event such as "Thomas is visiting Brighton" entails the change of state of "Thomas being in Brighton", the state of "being in Brighton" does not entail a "visit to Brighton". The reasoning becomes more complex when different tenses are involved, where "Thomas has arrived in Brighton" entails "Thomas is in Brighton", whereas "Thomas will arrive in Brighton" does not.

Distributional semantic word representations are an ubiquitous part for a number of NLP tasks such as Sentiment Analysis, Question Answering, Recognising Textual Entailment, Machine Translation, Parsing, etc. While their capacity for representing content words such as adjectives, nouns and verbs is well established, their ability to encode the semantics of closed class words has received considerably less attention.

In this talk I will show how composition can be used to leverage distributional representations for closed class words such as auxiliaries, prepositions and pronouns to model tense and aspect of verbs in context. I will furthermore analyse how and why closed class words are effective at disambiguating fine-grained distinctions in verb meaning. Lastly, I will demonstrate that a distributional semantic vector space model is able to capture a substantial amount of temporality in a novel tensed entailment task.  

Miro Batchkarov (Teebly)18 July 2018
NLP in INdustry: a personal story

This talk describes several industrial NLP projects that I have been involved in, such as information extraction from HTML pages, postal address parsing and chat bots. I will also discuss general trends in commercial NLP. Finally, I will briefly reflect on my personal experiences as I transitioned from a PhD to industry and the lessons learned along the way.  

Margaret Yann (Toronto)9 July 2018
Machine Learning and Deep Learning in Biomedical Data: Images and Clinical Notes

Despite major breakthroughs in deep learning, many challenges for artificial intelligence (AI) in medicine still exist. In this talk, I will discuss some research and challenges in the medical domain.

a) Images - X-ray Protein Crystallization Images

Synthesizing structural proteins is a complex, expensive and failure-prone process, where chemical conditions for each protein’s successful crystallization are unique and difficult to produce. A systematic high-throughput (HTP) approach in crystallography has been widely attempted, and large number of protein crystallization images have been generated. However, an efficient automatic analytical method for this remains a pressing need. We introduce a novel system, CrystalNet, to learn a convolutional deep neural network to efficiently and accurately analyze protein crystallization X-ray images automatically in HTP pipelines.

b) Electronic Medical Records - Clinical Notes

A number of challenges exist in analyzing unstructured free text data in electronic medical records. EMR text data generate gigabytes of free text information every year, yet are difficult to represent and model due to their high dimensionality, heterogeneity, sparsity, incompleteness and random errors. Moreover, standard NLP tools make errors when applied to clinical notes due to physicians’ use of unconventional written language in medical notes. Issues include polysemy, abbreviations, ambiguity, misspelling, variations, temporality and negation. This research presents a machine learning framework, Clinical Learning On Natural Expression (CLONE), to learn from large scale EMR databases, analyzing free text clinical notes from primary care practitioners. To demonstrate performance, we evaluate our model in a case study to identify patients with Congestive Heart Failure.  

Steve Clark (Cambridge and Deep Mind)26 April 2018
Neural Models for CCG Supertagging

In this talk I will describe how recurrent neural networks can be applied successfully to the CCG supertagging problem. The use of RNNs leads to substantial accuracy gains over previous feature-based taggers using maximum entropy models. In fact, the accuracy of an LSTM supertagger is higher than that of the Clark and Curran CCG parser, when evaluated solely on supertagging accuracy. I will then describe the work of Mike Lewis and colleagues, who has shown how a simple parsing algorithm, on top of an LSTM supertagger, can lead to highly accurate and efficient CCG parsing, substantiating the original claim of Bangalore and Joshi that supertagging can be thought of as 'almost parsing'.  

Andreas Vlachos (Sheffield)17 April 2018
Imitation Learning, Zero-shot Learning and Automated Fact Checking

In this talk I will give an overview of my research in machine learning for natural language processing. I will begin by introducing my work on imitation learning, a machine learning paradigm I have used to develop novel algorithms for structure prediction that have been applied successfully to a number of tasks such as semantic parsing, natural language generation and information extraction. Key advantages are the ability to handle large output search spaces and to learn with non-decomposable loss functions. Following this, I will discuss my work on zero-shot learning using neural networks, which enabled us to learn models that can predict labels for which no data was observed during training. I will conclude with my work on automated fact-checking, a challenge we proposed in order to stimulate progress in machine learning, natural language processing and, more broadly, artificial intelligence.  

Vicente Ivan Sanchez Carmona (UCL)22 March 2018
Towards Understanding Representation Learning Systems: An Experimental Approach

Representation learning systems are dedicated to learning a representation of the data in a form suitable to be used by other machine learning models, such as classifiers. However, some of these systems and the representations learned are difficult to interpret. Thus, figuring out if certain knowledge has been learned, such as hypernymy, or understanding why/how a particular prediction has been made are difficult tasks. In this talk, we will focus on three analyses that will let us better understand aspects such as those described above for particular representation learning systems and embeddings, namely GloVe embeddings, the ESIM system for the task of natural language understanding, and the popular Model F for the task of knowledge base population. The particular questions to be addressed are: Have GloVe embeddings learned hypernymy? How robust is the behavior of ESIM to challenging instances and what factors influence its predictions? How can we explain predictions of Model F? This talk is based on work published in EACL 2017, NAACL 2018 (to appear), and AAAI Spring Symposium 2015.  

Laura Rimell (Cambridge)6 July 2017
Two Case Studies in Lexical Semantics: Hypernym Detection and Antonym Generation

Detection of lexical semantic relations such as hypernymy, antonymy, meronymy, etc. using distributed word representations has practical use in many NLP applications, and success (or failure) in relation detection offers a better understanding of commonly used representations. An extension of the relation detection task, generation of word pairs governed by a lexical relation, has the further potential to improve natural language generation. In this talk I will address two approaches to lexical relations, one for hypernym detection and the other for antonym generation. Hypernym detection in distributed spaces has typically been based on a notion of substitutability: co-occurrence contexts of a hyponym (e.g. 'lion') are assumed to be valid contexts for its hypernym (e.g. 'animal'). However, this assumption often fails. I will discuss an alternative approach that considers the top features in a sparse context vector as a topic, and introduce an entailment measure based on topic coherence which can be used for multi-way relation classification. Turning to antonymy, previous work has focused on learning word representations that incorporate antonymy as part of the objective. Instead, I describe how we can learn a mapping that predicts antonyms of adjectives in an arbitrary word embedding model. I will introduce a continuous class-conditional bilinear neural network, inspired by relation detection networks used in computer vision, which gates the input word vector using information about the semantic domain, and is able to predict antonyms with high precision.  

Bill Keller (Sussex)29 June 2017
Fourteen Components of Creativity

Many commonsense concepts may be characterised as “essentially contested”, in that they resist a straightforward, universally agreed on interpretation or realisation. Where such concepts have a role in scientific enquiry, this can be problematic. A case in point is the concept of creativity, which encompasses a variety of related aspects, abilities, properties and behaviours. This poses problems for the study of creative practice generally and computational creativity in particular, where a tractable and well-articulated model of the concept is needed for the purpose of evaluation. This talk describes joint work with Anna Jordanous (University of Kent) on a novel approach to developing a model of the notion of creativity. Statistical language processing techniques were applied to the analysis of a corpus of academic papers on the topic. Words that appeared significantly often in connection with the concept were identified and then clustered to discern a number of key components. The components provide a set of 'building blocks' for creativity that have been used to model and evaluate creative practice. 

Thomas Kober (Sussex)15 June 2017
Inferring Unobserved Co-occurrence Events in Anchored Packed Trees

Distributional models are derived from co-occurrences in a corpus, where only a small proportion of all possible plausible co-occurrences will be observed. This problem is amplified for models like Anchored Packed Trees (APTs), that take the grammatical type of a co-occurrence into account. This results in a very sparse distributional space, requiring a mechanism for inferring missing knowledge. Most methods face this challenge in ways that render the resulting word representations uninterpretable, with the consequence that semantic composition becomes hard to model. In this talk, I explore an alternative which involves explicitly inferring unobserved co-occurrences using the distributional neighbourhood, that exploits the rich type structure in APTs and infers missing data by the same mechanims that is used for semantic composition. I show that distributional inference improves performance on several word similarity benchmarks and achieves state-of-the-art performance for two short-phrase composition benchmarks.  

Mark Steedman (Edinburgh)11 May 2017
Bootstrapping Language Acquisition

Recent work with Abend, Kwiatkowski, Smith, and Goldwater (2017) has shown that a general-purpose program for inducing parsers incrementally from sequences of paired strings (in any combinatory categorial language) and meanings (in any convenient language of logical form) can be applied to real English child-directed utterance from the CHILDES corpus to successfully learn the child's ("Eve's") grammar, combining lexical and syntactic learning in a single pass through the data. While the earliest stages of learning necessarily proceed by pure "semantic bootstrapping", building a probabilistic model of all possible pairings of all possible words and derivations with all possible decompositions of logical form, the later stages of learning show emergent effects of "syntactic bootstrapping" (Gleitman 1990), where EVE's increasing knowledge of the grammar of the language allows it to identify the syntactic type and meaning of unseen words in one trial, as has been shown to be characteristic of real children in experiments with nonce-word learning. The concluding sections of the talk consider Gleitman's argument that such learning can occur in situations where the meaning of an unknown word is either unavailable from the situation, or only partially available, and must be learned from subsequent occasions of use. I'll argue that this process can also be understood computationally by analogy with the process of machine learning of a "clustered entailment" semantics proposed by Lewis and the present author as a component of a robust system for question-answering from unrestricted text.  

David Spence (Sussex)30 March 2017
Quantification under Dataset Shift

Firstly I will outline the main approaches taken to quantification. “Quantification” is defined as estimating the class proportions in a dataset as opposed to “classification” which is the estimation of the class labels of individual instances. Most (all?) published quantification methods make the assumption that the test data has been sampled at random from the same population as the training data. This assumption frequently does not hold in reality. This problem can be termed “dataset shift”. Numerous researchers have developed approaches to deal with dataset shift in classification. I will outline how I have applied, and how I propose to apply these and other methods to the problem of dataset shift in quantification.  

Julie Weeds (Sussex)23 March 2017
Aligning Packed Dependency Trees: a theory of composition for distributional semantics

In this talk, I will present a summary of the Anchored Packed Tree (APT) theory. This approach maintains higher order dependency structure within distributional representations and by doing so allows composition to be defined in a way which is sensitive to syntax. Further, composition is able to mutually disambiguate constituents whilst providing a structured representation which can be further composed with other elements. The uniform nature of the representations means that it is possible to compare uncontextualised lexemes, contextualised lexemes, phrases and sentences using the same similarity measure. I will present results on a compositionality detection task for noun compounds, where we achieve state-of-the-art results using the APT approach. I will also discuss how distributional inference can be used to alleviate the data sparsity problem leading to performance matching the state-of-the-art at two benchmark phrase similarity tasks.  

Shinsuke Mori (Kyoto)11 November 2016
A Realistic Approach to Procedural Text Understanding

One of the ultimate goals of natural language processing (NLP) is text understanding. It is, however, difficult to understand text. Our research group focuses on procedural texts (cooking recipes), which are relatively clear without modality nor dependence on viewpoints, etc. and have many potential applications in artificial intelligence. In this talk first I present our flow graph corpus as their meaning representation. Then I talk about our attempt based on machine learning at estimating a flow graph given a recipe text. Finally I introduce other research activities in our laboratory such as automatic game commentary and NLP refering to real world information.

David Spence (Sussex)2 August 2016
Training Classifiers with Imbalanced Datasets

Training classifiers with imbalanced datasets (where the sizes of the classes are not all equal) can lead to poor classifier performance - especially with the smallest classes. I will talk through some examples we had from work estimating demographics of Twitter users. Then I will outline a couple of potential approaches to address the problem, namely under-sampling and synthetic-oversampling (SMOTE), and where they did and didn’t work with our data. I will finish with a general discussion on using classifiers to ‘quantify’ a group, i.e. where you are not interested in the classification of individuals but would like an accurate estimate of the size of the classes.

[Presented as a Sussex Data Analysis Forum discussion]

Joe Taylor (Sussex)23 June 2016
Learning using Unselected Features (LUFe)

Feature selection has been studied in machine learning and data mining for many years, and is a valuable way to improve classification accuracy while reducing model complexity. Two main classes of feature selection methods - filter and wrapper - discard those features which are not selected, and do not consider them in the predictive model. We propose that these unselected features may instead be used as an additional source of information at train time. We describe a strategy called Learning using Unselected Features (LUFe) that allows selected and unselected features to serve different functions in classification. In this framework, selected features are used directly to set the decision boundary, and unselected features are utilised in a secondary role, with no additional cost at test time. Our empirical results on 49 textual datasets show that LUFe can improve classification performance in comparison with standard wrapper and filter feature selection.

Rob Gaizauskas (Sheffield)20 May 2016
Making Sense of Multi-Party Conversations in Reader Comments on On-line News

Reader comments on news articles are now a pervasive feature of on-line news sites. What emerges, at least on the best of such sites, such as The Guardian, are multi-party conversations typically argumentative, in which readers question, reject, extend, offer evidence for, explore the consequences of points made or reported in the original article or in earlier commenters' comments. These conversations contain information of significant potential value to a range of types of users. However, a major problem is that within hours of a news article appearing they can rapidly grow to hundreds or even thousands of comments. Few readers have the patience to wade through this much content. One potential solution is to develop methods to summarize comment automatically, allowing readers to gain an overview of the conversation, or to otherwise facilitate access to information in the comments.

In this talk I report work carried out at Sheffield as part of the EU SENSEI project, which is looking at the broader question of making sense of conversational data. In particular I will discuss: a corpus of gold standard summaries of reader comments we have created to enable our research, methods we are developing for clustering and summarising comments, including both extractive and abstractive summarisation approaches, experiments with these methods, the user interface we have developed to present outputs to users and finally, a novel task-based evaluation framework we have developed to assess the system with end users.

Uxoa Inurrieta Urmeneta (University of the Basque Country)14 April 2016
Translation of Spanish Multiword Expressions into Basque: Adding Linguistic Data into Rule-Based Machine Translation

While Multiword Expressions (MWEs) are very frequent in both written text and speech, they do not usually follow the common grammatical and lexical rules of languages, and are thus problematic for Natural Language Processing, especially when it comes to multilingual applications like Machine Translation (MT). This talk will start by giving an overview of the challenges posed by MWEs to rule-based MT, and will describe some linguistic features of the translation of Spanish verb+noun combinations into Basque, two languages of very different typology. It will go on to explain the linguistic information we are adding to the "Matxin" MT system, and will show how this information can improve the detection of MWEs when combined with chunking and dependency parsing. Finally, the "Konbitzul" database will be introduced, which enables users to access all the information gathered from the linguistic analysis previously presented.

Felix Hill (Cambridge)8 April 2016
General-Purpose Representation Learning from Words to Sentences

Real-valued vector representations of words (i.e. embeddings) that are trained on naturally occurring data by optimising general-purpose objectives are useful for a range of downstream language tasks. However, the picture is less clear for larger linguistic units such as phrases or sentences. Phrases and sentences typically encode the facts and propositions that constitute the 'general knowledge' missing from many NLP systems at present, so the potential benefit of making representation-learning work for these units is huge. I will present a systematic comparison of different ways of inducing such representations with neural language models, which demonstrates clear and interesting differences between the representations learned by different methods; in particular, more elaborate or computationally expensive methods are not necessarily best. I'll also discuss a key challenge facing all research in unsupervised or representation learning for NLP - the lack of robust evaluations.

Chris Brew (Thomson Reuters)7 April 2016
TR Discover: A Natural Language Interface for Exploring Linked Datasets

Keywords are the dominant technology for providing non-technical users with access to linked data. This is problematic, because keywords cannot express all the necessary details of user intent. Non-technical users cannot be expected to use an expressive query language such as SQL, but still need to access the data.

They can, however, use English, Our system, called TR Discover, maps from a fragment of English into an intermediate first-order logic representation, which is in turn mapped into SPARQL or SQL. It has been tested both on a dataset relevant to drug research and on the publicly available QALD-4 dataset. The system incorporates a tailored autosuggest mechanism and a back end that delivers task appropriate analytics.

Yoko Yamakata (University of Tokyo)18 March 2016
Structure Extraction and Retrieval Method for Cooking Procedural Text

These days, there are more than a million recipes on the Web. When you search for a recipe with one query such as "carbonara," you can find thousands of "carbonara" recipes as the result. Even if you focus on only the top ten results, it is still difficult to find out the characteristic feature of each recipe because a cooking is a work-flow including parallel procedures. Therefore, in our method, a system extracts a flow-graph structure from a procedural text using NLP technique and finds out its specific features using graph matching technique. Structured recipes are applicable in various aspect such as summarizing of procedural text, scheduling of cooking order, recipe search and recognizing cooking behavior. Now our system addresses only Japanese recipes but we are constructing English version of our method.

Angus Roberts (Sheffield)3 December 2015
Tackling Text in the Medical Record: Challenges and Opportunities

Organised by Brighton and Sussex Medical School, as part of a series of e-health records events.

Anastasia Pentina (IST Austria)10 September 2015
The Role of the Task Order in Transfer Learning Algorithms

It is known that humans are able to incorporate knowledge from previously observed tasks for solving new ones more efficiently. This idea of transferring information between several related tasks underlies the approach of transfer learning. Moreover, human education is a highly organised process, where new concepts are introduced gradually, in a meaningful order. This makes the learning process more effective. In this work we study whether similarly the task order influences the performance of sequential transfer learning algorithms.

First, we focus on multi-task scenario, where there is a finite set of tasks that need to be solved. We develop an order-dependent generalisation bound which could be used to choose a favourable order of tasks for a sequential multi-task algorithm. Our experimental results show that the task order may substantially influence the prediction quality and that the proposed algorithm is able to find a beneficial order.

Next we study lifelong learning scenario, where the learner faces a stream of potentially infinitely many related tasks. The goal of the learner is to accumulate knowledge over the course of learning multiple tasks in order to improve the performance on future ones. Canonical assumption that allows reasoning about the future success of the learning process is that the observed tasks are identically and independently distributed. However it limits the applicability of the results in practice. In this work we study two scenarios when lifelong learning is possible even though the tasks do not form an iid sample: first, when they are sampled from the same distribution, but possibly with dependencies, and second, when the task environment is allowed to change over time in a consistent way. In the first case we prove a PAC-Bayesian theorem that can be seen as a direct generalisation of the analogous previous result for the iid case. For the second scenario we propose to learn an inductive bias in form of a transfer procedure. We present a generalisation bound and show on a toy example how it can be used to identify a beneficial transfer algorithm.

Peng Jin (Leshan Normal University, China)29 January 2015
Dataless Text Classification with Descriptive LDA

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios.

We propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is coupled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.

[Practice talk for AAAI-15. Joint work with Xingyuan Chen, Yunqing Xia and John Carroll]

Radim Rehurek27 November 2014
Word2vec & Co

Continuous representations of words via deep learning capture semantic and syntactic properties of words and phrases, so that for example ‘vec(“Montreal Canadiens”) - vec(“Montreal”) + vec(“Toronto”)’ is similar to ‘vec(“Toronto Maple Leafs”)’. In this talk I'll go over a particular model published by Google, called word2vec, its optimizations, applications and extensions from representing individual words to entire sentences and documents.

Richard Evans (Little Text People)9 October 2014
Cathoristic Logic: A Modal Logic of Incompatible Propositions

Natural language is full of incompatible alternatives. If Pierre is the current king of France, then nobody else can simultaneously fill that role. A traffic light can be green, amber or red - but it cannot be more than one colour at a time. Mutual exclusion is a natural and ubiquitous concept. First-order logic can represent mutually exclusive alternatives, but incompatibility is a 'derived' concept, expressed using a combination of universal quantification and identity.

In this talk I will introduce an alternative approach, Cathoristic logic, where exclusion is expressed directly, as a first-class concept. Cathoristic logic is a multi-modal logic containing a new logical primitive allowing the expression of incompatible sentences. I will present the syntax and semantics of the logic, and outline a number of results such as compactness, a semantic characterisation of elementary equivalence, the existence of a quadratic-time decision procedure, and Brandom's incompatibility semantics property. I will demonstrate the usefulness of the logic as a language for knowledge representation.

[Joint work with Martin Berger]

Joe Taylor (Sussex)18 September 2014
The Usage of Privileged Information for Veterinary Diagnosis

The 'Learning Using Privileged Information' (LUPI) paradigm has recently gained popularity as a means to incorporate additional knowledge about training data - which is not available for testing data - into a classifier. This work investigates the usage of the LUPI framework to improve performance of a classification task: diagnosing canine cruciate ligament disease based on clinical notes and treatment records. Also investigated is a new means of representing clinical documents, as a combination of the semantic vectors for their constituent words. LUPI was not found to make any improvement over models that did not use privileged information. However, document representation with semantic vectors resulted in significantly improved classification accuracy.

Aleksander Savkov (Sussex)19 June 2014
Chunking Clinical Text Containing Non-Canonical Language

Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using off-the-shelf language processing tools and pre-trained statistical models. We evaluate chunking accuracy with respect to part-of-speech tagging quality, choice of chunk representation, and breadth of context features. Our results indicate that narrow context feature windows give the best results, but that chunk representation and minor differences in tagging quality do not have a significant impact on chunking accuracy.

[Practice talk for BioNLP 2014. Joint work with John Carroll and Jackie Cassell]

Novi Quadrianto (Sussex)13 June 2014
Classification using Privileged Noise

Prior knowledge is crucial component of any learning system. Depending on the learning paradigm used, it can enter a system in terms of a preferred set of prediction functions, as a Bayesian prior over parameters, or as additional information that is about the training data during learning. The Learning with Priviledged Information (LUPI) framework uses the last of these setups: in addition to a data modality that one can use, the classifier has access to additional information about each training example. It is characteristic of such privileged data that it cannot be used as input modality to the classifier itself. Several existing works have studied the LUPI problem and proposed methods in the support vector machine (SVM) framework. In this work, we introduce a new way of treating the LUPI problem in the Bayesian framework using the concept of privileged noise.

[Joint work with Daniel Hernandez Lobato (Universidad Autónoma de Madrid), Viktoriia Sharmanska (IST Austria), Christoph Lampert (IST Austria), and Kristian Kersting (TU Dortmund)]

Andreas Vlachos (UCL)12 June 2014
Dependency Language Models for Sentence Completion

Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and expensive to train.

[Joint work with Joseph Gubbins]

Diane Litman (Pittsburgh)29 May 2014
Modeling and Exploiting Review Helpfulness for Summarization

This talk will illustrate some of the opportunities and challenges in processing both commercial and educational review corpora with respect to helpfulness. I will first present a content-based approach for automatically predicting review helpfulness, where features representing language usage, content diversity and helpfulness-related topics are selectively extracted from review text. Experimental results across camera, movie, and student peer reviews demonstrate the utility of the approach. I will then present two extractive approaches to review summarization, where helpfulness ratings are used to either guide review-level filtering or to supervise a topic model for sentence-level content scoring. Experimental results show that helpfulness-guided review summarizers can outperform traditional methods in human and automated evaluations.

[Joint work with Wenting Xiong]

Vasileios Lampos (UCL)15 May 2014
A User-Centric Model of Voting Intention from Social Media

Social Media contain a multitude of user opinions which can be used to predict real-world phenomena in many domains including politics, finance and health. Most existing methods treat these problems as linear regression, learning to relate word frequencies and other simple features to a known response variable (e.g., voting intention polls or financial indicators). These techniques require very careful filtering of the input texts, as most Social Media posts are irrelevant to the task. We present a novel approach which performs high quality filtering automatically, through modelling not just words but also users, framed as a bilinear model with a sparse regulariser. We also consider the problem of modelling groups of related output variables, using a structured multi-task regularisation method. Our experiments on voting intention prediction demonstrate strong performance over large-scale input from Twitter on two distinct case studies, outperforming competitive baselines.

[Joint work with Daniel Preotiuc-Pietro and Trevor Cohn]

Antonio Sanfilippo (Qatar Foundation R&D)11 April 2014
Violent Intent Modeling

A speaker's language is the window to their intent. Diverging attitudes toward the use of violence by groups or individuals who otherwise share the same ideological goals are reflected in the language communication strategies that these groups/individuals adopt in advocating the pursuit of their cause. The identification of language use factors that are highly correlated with a speaker's attitude to violence enables the development of computational models of violent intent. These models can help recognize violent intent in written or spoken language and forecast the likely occurrence of violent events and help to identify changing communication outreach characteristics across radical groups. The goal of this talk is to review recent approaches to modeling violent intent based on content analysis and demonstrate how the ensuing models can be used to assess and forecast ebbs and flows of sociopolitical contention in the public dialogue.

Sussex NLP staff and students13 March 2014
NLP@Sussex Workshop

This is an all-day workshop, consisting of 10 short talks presented by Sussex NLP staff and students.

9.30-09.45 Tea/Coffee/Cakes
9.45-10.15 Aleksander Savkov Unlocking the Hidden Information in Free-Text Clinical Notes
10.15-10.45 Simon Wibberley Method51 for Mining Insight from Social Media Datasets
10.45-11.15 Andy Robertson Exploiting Dependency Features in DUALIST
11.15-11.30 Tea/Coffee/Cakes
11.30-12.00 Thomas Kober Scaling Semi-supervised Multinomial Naïve Bayes
12.00-12.30 Matti Lyra Classification for Relevance on Online Data Streams
12.30-14.00 Lunch
14.00-14.30 Daoud Clarke/Bill Keller Probabilistic Semantics for Natural Language
14.30-15.00 David Weir A Framework for Distributional Composition
15.00-15.15 Tea/Coffee/Cakes
15.15-15.45 Miro Batchkarov Evaluating the Distributional Compositionality of Noun Phrases for Document Classification
15.45-16.15 Julie Weeds Noun Phrase Composition
16.15-16.45 Jeremy Reffin From Strings to Things
Danushka Bollegala (Liverpool)14 November 2013
Distributional Semantics beyond a Single Corpus

Representing the semantics of a word using the distribution of its contextual neighbours is a popular and an effective technique that has been used in numerous tasks in natural language processing such as similarity measurement, word clustering, query expansion, and sentiment classification. However, the meaning of a word often varies from one text corpus to another. If we can somehow predict the distributional representation of a word in a text corpus (target domain), given its distributional representation in a different text corpus (source domain), we can easily adapt the NLP tools that we have already developed for the source domain without any further customisations for the target domain. We present an unsupervised method to learn a distribution prediction model that can accurately predict the distributional representation of a word in a target domain, given its distributional representation in a source domain. We use cross domain sentiment classification as a specific example to demonstrate the accuracy of the proposed distribution prediction method. We also discuss other possible applications of cross-domain distribution prediction, such as cross-domain part-of-speech prediction, and cross-domain/cross-lingual distributional compositional semantics models.

[Joint work with David Weir and John Carroll]

Fadi Zaraket (American University of Beirut)22 August 2013
NLP at the American University of Beirut

I will describe our work in using cross document analysis and morphology based features to extract entities and relations from two sets of documents. The method makes an assumption that the two sets of documents have similar entities. The method uses relations extracted from one set of documents to enhance the entity and relation extraction from the second set of documents. The method uses graph coloring algorithms to identify cross references of similar entities in multiple documents. We applied the method on Arabic sets of documents such as hadith and biography books, biblical text, and news papers and event programs. I will also present other ongoing NLP projects at AUB.

Simon Wibberley (Sussex)1 August 2013
Language Technology for Agile Social Media Science

I will present an extension of the DUALIST tool that enables social scientists to engage directly with large Twitter datasets. The approach supports collaborative construction of classifiers and associated gold standard data sets. The tool can be used to build classifier cascades that decomposes tweet streams, and provide analysis of targeted conversations. A central concern is to provide an environment in which social science researchers can rapidly develop an informed sense of what the datasets look like. The intent is that they develop, not only an informed view as to how the data could be fruitfully analysed, but also how feasible it is to analyse it in that way.

[Joint work with David Weir and Jeremy Reffin. Practice talk for LaTeCH 2013.]

Mehrnoosh Sadrzadeh (Oxford) 18 April 2013
The Logical and Vector Space Structure of Relative Pronouns

Relative pronouns are often treated as `noise' in distributional models of meaning. As a result, they are not taken into account when building vector representations for the clauses containing them. However, they provide vital information about how the information of different parts of sentences are related to each other, for instance how certain words in a sentence are described by other words.

I will present a logical semantics for these pronouns in the categorical compositional distributional model of meaning. This semantics is based on an algebraic structure on vector spaces, referred to as Frobenius Algebras, whereby the interactions between the words are depicted using information-flow diagrams. I will then show how these logical operations are interpreted in the usual Boolean semantics of language and also in the vector models. Finally, I will hint at the concrete applicability of the setting by providing some preliminary sample data from a large-scale corpus.

[Joint work with Steve Clark and Bob Coecke]

Peter Hines (York) 24 January 2013
Logic, Meaning and Grammar

In this talk I describe recent work on the relationship between meaning and grammar in natural language, using the basic principle that grammar should be thought of as a formal type system for meaning. Given that type systems themselves have a natural logical interpretation, I investigate the relationship between the 'logic' provided by the grammatical structure, and the 'logic' provided by the meaning of a sentence.

Using tools from categorical logic, I describe this interaction for various 'toy examples', and discuss how or whether the meaning of sentences with distinct grammatical structure may be compared.

Matti Lyra (Sussex) 29 November 2012
Challenges in Applying Machine Learning to Media Monitoring

The Gorkana Group provides high quality media monitoring services to its clients. In this talk I will describe an ongoing project aimed at increasing the amount of automation in Gorkana Group's workflow through the application of machine learning and language processing technologies. It is important that Gorkana Group's clients should have a very high level of confidence that if an article has been published, that is relevant to one of their briefs, then they will be shown the article. However, delivering this high-quality media monitoring service means that humans are having to read through very large quantities of data, only a small portion of which is typically deemed relevant. The challenge being addressed by this work is how to efficiently achieve such high-quality media monitoring in the face of huge increases in the amount of the data that needs to be monitored. I will discuss some of the findings that have emerged during the early stages of the project. I will show that, while machine learning can be applied successfully to this real world business problem, the distinctive constraints of the task give rise to a number of interesting challenges.

[Joint work with Daoud Clarke, Hamish Morgan, Jeremy Reffin, and David Weir]

Daoud Clarke (Sussex) 22 November 2012
Semantic Composition with Quotient Algebras

I will describe an approach to compositionality in vector based semantics which resolves some problems with previous approaches: strings of different lengths are comparable, yet meanings of constituents are retained in the composed representation.

The tensor product has been proposed as a method of composition, but has the undesirable property that strings of different length are incomparable. I will consider how a quotient algebra of the tensor algebra can allow such comparisons to be made, offering the possibility of data-driven models of semantic composition.

Suresh Manandhar (York) 4 October 2012
Unsupervised Learning in Natural Language Processing

Unsupervised learning is an emerging area within NLP that poses interesting and challenging problems. The primary advantage of unsupervised and minimally supervised methods is that annotated data is not required or required only in small quantities. In this talk, I will present our current work on word sense induction, morphology learning and compositional distributional semantics. Sense induction is the task of discovering all the senses of a given word from raw unannotated data. Our collocational graph based method achieves high evaluation scores while overcoming some of the limitations of existing methods. Furthermore, senses can be grouped into a hierarchy by inferring random trees over graphs. In a similar vein, we show that hierarchical morphological paradigms can be learnt from unlabelled data within a Dirichlet process based learning framework. Finally, within the emerging area of compositional distributional semantics we show how sense information can be exploited in computing compositional vectors.

Martha Palmer (University of Colorado at Boulder) 29 March 2012
Beyond Shallow Semantics

Shallow semantic analyzers, such as semantic role labelers and sense taggers, are increasing in accuracy and becoming commonplace. However, they only provide limited and local representations of words and individual predicate-argument structures.

This talk will address some of the current opportunities and challenges in producing deeper, richer representations of coherent eventualities. Available resources, such as VerbNet, that can assist in this process will also be discussed, as well as some of their limitations.

Enrique Alfonseca (Google Research, Zurich) 15 December 2011
Natural Language Understanding in Google Research Zurich

In this talk I will describe the aims, scope and results of the Natural Language Understanding team in Google Research Zurich. Areas where we have experience include distributional similarities, open information extraction (learning attributes, values and relations for populating knowledge bases) and automatic text summarization. I will discuss real-world commercial applications, how we plan to put all these lines of work together around the topic of text summarization, and my vision of the future of the field.

Rodger Kibble (Goldsmiths) 8 December 2011
Nominalisation and Discourse Relations

Discourse relations such as Cause, Sequence, Condition and so on have been standardly treated as holding between adjacent clauses, or text spans consisting of an integral sequence of clauses. Kibble (1999) and Danlos (2006) independently observed that rhetorical relations may also be realised as verbs which take nominalised propositions as arguments, in contrast to conventional analyses which only recognised "discourse connectives" as playing this role. Both studies offered formalisations using Asher and Lascarides' SDRT (2003). Kibble (1999), reporting a small corpus study using Patient Information Leaflets (PILS), hypothesized that clause-internal relations are limited to the "informational" or "semantic" subset of Mann and Thompson's RST repertoire (1987), while "intentional" or "presentational" relations invariably hold between clauses. However Power (2007) offers constructed examples involving both "intentional" relations such as Concession, Restatement, Summary in addition to "informational" relations. This talk will draw on new corpus studies to present evidence that the key factor is the nominalisation of propositional arguments which enables relations to be realised either as verbs or prepositions, in contrast to Power and Danlos who focus on verbs. I will also discuss recent work by my student Susan Lynch on automated recognition of nominalisations.

Nicholas Asher and Alex Lascarides, 2003. Logics of Conversation.
Laurence Danlos, 2006. "Discourse verbs" and discourse periphrastic links, Proceedings of Constraints in Discourse Workshop, ed. Candy Sidner, John Harpur, Anton Benz and Peter Kuehnlein.
Rodger Kibble, 1999. Nominalisation and rhetorical structure, ESSLLI Formal Grammar Workshop, Utrecht.
William Mann and Sandra Thompson, 1987. Rhetorical structure theory: a theory of text organization. In L. Polanyi, ed., The Structure of Discourse.
Richard Power, 2007. Abstract verbs. Proceedings of INLG.

Jonathon Read (University of Oslo) 14 November 2011
Automatically Detecting Negation and Speculation in Biomedical Text

Analysis of negation and speculation is an increasingly important NLP task which can enrich text mining applications by detecting hedged and negated statements. Our approach to this task begins with a maximum entropy classifier that identifies cues of negation/speculation using surface features. We then utilise two methods to detect the scope of such cues. Firstly we apply a set of heuristics that operate on dependency graph representations of sentences. Secondly we employ a data driven method to rank candidate scopes based on constituents in head-driven phrase structure grammar analyses. The effectiveness of the methods are evaluated using the BioScope corpus, a collection of biomedical texts annotated for negation and speculation.

[Joint work with Erik Velldal, Lilja Øverlid and Stephan Oepen]

Elena Kozerenko (Russian Academy of Sciences, Moscow) 5 September 2011
Representation of Syntactic Transformations in a Hybrid Grammar for Machine Translation

I will focus on the problem of cross-language syntactic transformation modelling for the design and development of transfer-based machine translation systems and for parallel text alignment. The same meaning can be conveyed by different language structures, so the establishment of cross-language matches and inter-structural synonymy is of prime importance. The solutions are proposed on the basis of the hybrid grammar comprising linguistic rules (Cognitive Transfer Grammar) and statistical information about the language structures preferred by particular languages. The presented approach was employed in the implemented machine translation system for the English-Russian language pair. Possible ways of evaluation will be discussed. At present a multilingual knowledge base including semantically aligned parallel texts is being developed.

Charles Greenbacker (Delaware) 4 August 2011
Generating Abstractive Summaries of Multimodal Documents

Magazines and newspapers often contain information graphics (e.g., bar charts, line graphs) that make a significant contribution to an article's communicative intent, and a summary that ignores the graphical content may miss out on an important part of the overall discourse goal. Studies have shown that the message conveyed by information graphics in popular media is often not repeated in the article text. Thus, extraction-based summarization systems have great difficulty dealing with this graphical content since they lack suitable sentences to extract. We are developing a framework that will facilitate the generation of truly abstractive summaries of multimodal documents by first building a semantic model of the text and graphical content, and then using this model as the basis of the summary generation process. This talk will describe the development of our summarization system and will also address some of the issues involved with the evaluation of natural language generation systems.

[A more detailed abstract can be found at]

Danushka Bollegala (Tokyo) 12 May 2011
Cross-domain Sentiment Classification using a Sentiment Sensitive Thesaurus

I will describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automatically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.

John Tait (Information Retrieval Facility, Vienna) 24 March 2011
The Information Retrieval Facility and Professional Search

The IRF is a not-for-profit research institute founded in Vienna in 2007. The mission of the IRF is to bring together academic researchers and those involved with professional search (as users or suppliers) to work on problems of large scale information retrieval. The talk will review the foundation of the IRF, look at what we offer academic researchers, and overview the activities we have promoted since 2007. In particular the talk will give an outline of the Khresmoi project, a European Union ICT four year Integrated Project which began in September 2010 which aims to improve access to medical information, including text and images for various kinds of medical professionals and the general public.

Dan Cristea (University of Iasi, Romania) 17 March 2011
Grounding Coherence Properties of Discourse

The research investigates two fundamental issues related to the production of coherent discourse by intelligent agents: a cohesion property and a fluency property. The cohesion aspects of discourse production relate to the use of pronominal anaphora: whether and in what conditions, in the incipient phases of language acquisition, intelligent agents can make use of pronouns as means to express recently mentioned entities? The approach follows an evolutionary paradigm of language acquisition. Experiments show that pronouns spontaneously appear in the vocabulary of a community of 10 agents dialogging on a static scene and that, generally, the use of pronouns enhances the communication success. Secondly, the processing load experiments address the fluency of discourse, measured in terms of Centering transitions. Contrary to previous findings, this side of discourse coherence seems to be grounded in an innate cognitive mechanism, which is driven by an economicity principle. We prove experimentally that a model of immediate memory which resembles the stack data structure is optimal in terms of access costs and that, put at the base of the production of discourse, it leads to discourses which have similar fluency patterns as those produced by humans.

Alex Clark (Royal Holloway, London) 27 January 2011
Efficient, Correct, Unsupervised Learning of Context-sensitive Languages

A central problem for NLP is grammar induction: the development of unsupervised learning algorithms for syntax. In this talk I will present a lattice-theoretic representation for natural language syntax, called Distributional Lattice Grammars. These representations are objective or empiricist, based on a generalisation of distributional learning, and are capable of representing all regular languages, some but not all context-free languages and some non-context-free languages. I will present a simple algorithm for learning these grammars together with a proof of the correctness and efficiency of the algorithm.

Daoud Clarke (Metrica Ltd/Univ Hertforshire) 9 December 2010
Sentiment Analysis for Media Evaluation

Metrica is a media analysis company recognised as a world leader in media analysis, whose clients include some of the top technology companies, charities and government agencies. Their strengths are in manual analysis of traditional (print) media, however the explosion in social media, and its corresponding increase in importance has led them to explore methods of automating this analysis, since there is too much data to analyse manually. In this talk, I will discuss our attempts to apply sentiment analysis to Metrica's client databases, both in traditional and social media, and how we dealt with the problems that arose. In particular, I will talk about the problem of unbalanced data, which is inherent in our datasets, and discuss which featuresets and classifiers we found to work the best. I will also talk about active learning for sentiment analysis, and its potential to reduce the amount of manual analysis.

Bernd Bohnet (Stuttgart) 25 November 2010
Fast Accurate Dependency Parsing with a Hash Kernel

Despite a long tradition in linguistics, dependency grammar has played a fairly marginal role in natural language processing. Recently, we see an increasing interest in dependency-based representations especially in natural language parsing. This might be motivated by the usefulness of the representation and by the advances in dependency parsers. In this talk, we provide first an overview of graph-based dependency parsing, which is currently one of the most successful approaches to dependency parsing. In addition to a high accuracy, short parsing and training times are the most important properties of a parser in order to be useful for applications. Therefore, we will present approaches to improve the parsing speed and approaches to improve the accuracy over state of the art results. Among these improvements, we will highlight, one that has an impact on accuracy and speed, which seems to be a contradiction since usually one of the properties can be improved to the expense of the other one. The origin of this is the learning technique, which is based on the perceptron algorithm with a random function. We call the learning approach Hash Kernel. The learning technique is very promising for other NLP tasks as well.

Daoud Clarke (Metrica Ltd/Univ Hertforshire) 8 July 2010
Semantic Composition with Quotient Algebras

We describe an algebraic approach for computing with vector based semantics. The tensor product has been proposed as a method of composition, but has the undesirable property that strings of different length are incomparable. We consider how a quotient algebra of the tensor algebra can allow such comparisons to be made, offering the possibility of data-driven models of semantic composition.

[Practice talk for GEMS Workshop at ACL 2010]

David Milward (Linguamatics Ltd, Cambridge) 1 July 2010
Real-time Social Media Monitoring using Agile Text Mining

This talk will show how agile text mining software based on Natural Language Processing can be applied to social media monitoring to monitor sentiment, provide early warning, and track key opinion leaders. Focussing on Twitter, the talk will show how NLP can be used to filter what would be otherwise noisy data, and how information from the content itself and user profiles can be combined to provide fine-grained segmentation of different populations of users. The talk will conclude with an analysis of Tweets concerning the UK election leader debates, showing how opinion changed over time.

Miles Osborne (Edinburgh) 17 June 2010
What is Happening Now? Finding Events in Massive Message Streams

Social Media (e.g. Twitter, Blogs, Forums, FaceBook) has exploded over the last few years. FaceBook is now the most visited site on the Web, with Blogger being the 7th and Twitter the 13th. These sites contain the aggregated beliefs and opinions of millions of people on an epic range of topics, and in a large number of languages. Twitter in particular is an example of a massive message stream and finding events embedded in it poses hard engineering challenges. I will explain how we use a variant of Locality Sensitive Hashing to find new stories as they break. The approach scales well, easily dealing with the more than 1 million Tweets a day we process and only needing a single processor. For June 2009, the fastest growing stories all concerned deaths of one kind or another.

Takenobu Tokunaga (Tokyo Institute of Technology) 5 March 2010
Referring Expressions in Situated Dialogues

Referring expressions play a important role in human-human/human-computer interactions, particularly in situated settings. We are developing a Japanese corpus of referring expressions collected from dialogues where two participants collaboratively solve the Tangram puzzle. The corpus records every action by participants and the arrangement of the puzzle pieces in synchronisation with the course of dialogues. This extra-linguistic information as well as transcribed dialogues provides a fundamental resource for research of understanding and generating referring expressions. We sketch out the corpus and introduce a research on generation and analysis of Japanese referring expressions using this corpus. We also briefly mention two on-going projects: creation of corpus with participants' gaze information and evaluation of generated referring expressions.

Danushka Bollegala (Tokyo) 21 January 2010
Measuring the Similarity between Implicit Semantic Relations from the Web

Measuring the similarity between semantic relations that hold among entities is an important and necessary step in various Web related tasks such as relation extraction, information retrieval and analogy detection. For example, consider the case in which a person knows a pair of entities (e.g. Google, YouTube), between which a particular relation holds (e.g. acquisition). The person is interested in retrieving other such pairs with similar relations (e.g. Microsoft, Powerset). Existing keyword-based search engines cannot be applied directly in this case because, in keyword-based search, the goal is to retrieve documents that are relevant to the words used in a query - not necessarily to the relations implied by a pair of words. We propose a relational similarity measure, using a Web search engine, to compute the similarity between semantic relations implied by two pairs of words. Our method has three components: representing the various semantic relations that exist between a pair of words using automatically extracted lexical patterns, clustering the extracted lexical patterns to identify the different patterns that express a particular semantic relation, and measuring the similarity between semantic relations using a metric learning approach. We evaluate the proposed method in two tasks: classifying semantic relations between named entities, and solving word-analogy questions. The proposed method outperforms all baselines in a relation classification task with a statistically significant average precision score of 0.74. Moreover, it reduces the time taken by Latent Relational Analysis to process 374 word-analogy questions from 9 days to less than 6 hours, with an SAT score of 51%.

[Joint work with Yutaka Matsuo and Mitsuru Ishizuka]

Anna Jordanous (Sussex) 15 December 2009
Defining Creativity: Finding Keywords for Creativity Using Corpus Linguistics Techniques

A computational system that evaluates creativity needs guidance on what creativity actually is. It is by no means straightforward to provide a computer with a formal definition of creativity; no such definition yet exists and viewpoints in creativity literature vary as to what the key components of creativity are considered to be. This work combines several viewpoints for a more general consensus of how we define creativity, using a corpus linguistics approach. 30 academic papers from various academic disciplines were analysed to extract the most frequently used words and their frequencies in the papers. This data was statistically compared with general word usage in written English. The results form a list of words that are significantly more likely to appear when talking about creativity in academic texts. Such words can be considered keywords for creativity, guiding us in uncovering key sub-components of creativity which can be used for computational assessment of creativity.

Bilal Khaliq (Sussex) 19 November 2009
Particle Language Modelling for Arabic Speech Recognition

Due to the inflectional nature and morphological complexity of the Arabic language, Arabic text data suffers significantly from two key problems for Automatic Speech Recognition, that of data sparsity and higher out-of-vocabulary (OOV) rates. Data sparsity poses problems for standard N-gram models reducing the number of instances of many words while at the same time increasing the required numbers of N-grams. And as Arabic generates many more unique words than English, it results in a higher OOV rate posing the need for an Arabic corpus to be much larger to achieve an OOV rate similar to an English corpus.

To address these two problems, a statistical technique to build sub-words or 'particles' as modelling units was previously developed and successfully applied to Russian. In this talk I will examine the utility of particle language models for Arabic, a language exhibiting similar morphological characteristics to Russian. Further, the models were evaluated using Word Error Rates based on Speech Recognition experiments which is a more reliable measure of performance than evaluation using Perplexity values, as was done for Russian.

Taras Zagibalov (Sussex) 29 October 2009
Multilingual Opinion Holder and Target Extraction using Knowledge-Poor Techniques

I will describe an approach to multilingual sentiment analysis, in particular opinion holder and opinion target extraction, which requires no annotated data and minimal language-specific input. The approach is based on unsupervised, knowledge-poor techniques which facilitate adaptation to new languages and domains. The system's results are comparable to those of supervised, language-specific systems previously applied to the NTCIR-7 MOAT evaluation data.

[Practice talk for LTC'09. Please note unusual time]

Chris Thornton (Sussex) 15 October 2009
Music-making with Hierarchical Markov Models

In the emerging field of empirical creativity, new artifacts (e.g., drum rhythms) are generated from models culled from one or more existing artifacts. Applied to music, this approach often involves use of Markov models. But these may be insensitive to the global, hierarchical properties that are particularly significant for music. The talk introduces the *hierarchical* Markov model and shows how it enables both large-scale and small-scale properties to be captured in a uniform way. It also presents some examples illustrating application of the approach to melodic sequences.

Tim Baldwin (University of Melbourne, Australia) 15 July 2009
To Search, Perchance to Find: Enhanced Information Access over Troubleshooting-oriented Web User Forum Data

The ILIAD (Improved Linux Information Access by Data Mining) Project is an attempt to apply language technology to the task of Linux troubleshooting by analysing the underlying information structure of a multi-document text discourse and improving information delivery through a combination of filtering, term identification and information extraction techniques. In this talk, I will outline the overall project design and present results for a variety of thread-level filtering tasks.

Eva Banik (Open University) 2 July 2009
Extending a Surface Realizer to Generate Coherent Discourse

The ultimate aim of research on natural language generation is to develop large-scale, domain independent NLG systems, which are able to generate high quality, fluent and well-formatted texts. In order to produce high quality, coherent text, generators need to be able to model referential coherence and pronominalization, insert appropriate discourse connectives using appropriate constructions (e.g. preposed, postposed or interposed subordinate clauses) and provide a way for the user to specify which bits of information should be emphasized in the text.

Many NLG systems use a pipeline architecture where linguistic information is distributed across several system modules. These systems typically introduce additional modules (e.g. an aggregation or revision module) in order to model the above phenomena, resulting in more complex systems with limited flexibility. Using this approach, the research challenges in NLG become system engineering tasks, limited to questions such as: what modules should a system have, how should these modules be ordered, and how should the interactions between modules be handled.

In this talk I would like to present a slightly different perspective, where some of the research challenges in NLG are reformulated as grammar engineering tasks. I will argue that when linguistic resources in an NLG system are centralized we can model constraints on discourse coherence by simply incorporating more linguistic information into the grammar of a surface realizer. This approach improves the flexibility of the system (i.e. produces more paraphrases for the same input) and makes it possible to generate coherent text without introducing additional modules.

Musa Alkhalifa (UPC Barcelona) 11 June 2009
Building a WordNet for Arabic: Methodology and Challenges

Arabic WordNet (AWN) can be considered one of the most important, freely available, lexical resources developed so far for Arabic language. AWN has been built following the design of Princeton WordNet and adopting EuroWordNet methodology of manually encoding a set of base concepts while maximizing compatibility across wordnets. As a result, there is a straightforward mapping from Arabic WordNet onto Princeton WordNet 2.0 and many other wordnets. The Suggested Upper Merged Ontology (SUMO) is mapped by hand to all synsets of Princeton WordNet and has been extended with a number of concepts that correspond to words that are lexicalized in Arabic but not in English, providing an interlingua which is not limited by the lexicalization of any particular human language and underlying the development of semantics-based computational tools for multilingual NLP. In this talk I will present the methodologies used and the challenges faced while constructing a WordNet for Arabic and highlight some experiments we conducted (to exploit Arabic lexical and morphological rules) to reduce human effort and extend AWN (semi-)automatically. I will conclude with showing the interfaces we developed for lexicographers and users of AWN, the downloadable AWN browser, and an online demo of Arabic Word Spotter which identifies those words that are covered in AWN in an Arabic web page and provides their translations.

David Weir (Sussex) 21 May 2009
Optimal Reduction of Rule Length in Linear Context-Free Rewriting Systems

Linear Context-free Rewriting Systems (LCFRS) is an expressive grammar formalism with applications in syntax-based machine translation. The parsing complexity of an LCFRS is exponential in both the rank of a production, defined as the number of nonterminals on its right-hand side, and a measure for the discontinuity of a phrase, called fan-out. We present an algorithm that transforms an LCFRS into a strongly equivalent form in which all productions have rank at most 2, and has minimal fan-out. Our results generalize previous work on Synchronous Context-Free Grammar, and are particularly relevant for machine translation from or to languages that require syntactic analyses with discontinuous constituents.

[Joint work with Carlos Gomez-Rodriguez, Marco Kuhlmann and Giorgio Satta. Practice talk for NAACL09.]

Diana McCarthy (Sussex) 14 May 2009
Alternative Annotations of Word Usage

Right from Senseval's inception there have been questions over the choice of sense inventory for word sense disambiguation. While researchers usually acknowledge the issues with predefined listings produced by lexicographers, such lexical resources have been a major catalyst to work on annotating words with meaning. As well as the heavy reliance on manually produced sense inventories, the work on word sense disambiguation has focused on the task of selecting the single best sense from the predefined inventory for each given token instance. There is little evidence that the state-of-the-art level of success is sufficient to benefit applications. We also have no evidence that the systems we build are interpreting words in context in the way that humans do. One direction that has been explored for practical reasons is that of finding a level of granularity where annotators and systems can do the task with a high level of agreement. In this talk I will discuss some alternative annotations using synonyms, translations and WordNet senses with graded judgments which are not proposed as a panacea to the issue of semantic representation but will allow us to look at word usages in a more graded fashion and which are arguably better placed to reflect the phenomena we wish to capture than the `winner takes all' strategy.

Sivaji Bandyopadhyay (Jadavpur University, India) 7 May 2009
Question Generation using VerbNet

We describe our work on question generation on simple sentences in English, mostly collected from the example sentences in the VerbNet. A named entity recognizer, Part of speech tagger and a chunker have been applied on each of these sentences. The frame and the syntax information for the verb in each sentence are identified using the VerbNet. Verbs can take any of a set of general, adjunct-like arguments. Each verb argument is assigned one (usually unique) thematic role within the class. We associate thematic roles identified for the verb arguments from the VerbNet to the appropriate chunks. Question templates for each primary and secondary frame of a verb class in the VerbNet are stored in a knowledge base. Currently only concept completion questions are being handled. This knowledge base is used to generate appropriate questions from the input sentence. Questions involving named entities in thematic roles assigned to the verb arguments are assigned more importance than other questions. The question generation system is now being expanded to include various verb classes, other question types as well as to handle generating question at the document level.

Rob Koeling (Sussex) 23 April 2009
Finding Paraphrases for Dialogue Utterances Using a Multilingual Parallel Movie Subtitle Corpus

I will describe a method for finding paraphrases for common dialogue utterance using a multilingual corpus of movie subtitles. Paraphrases are found by 1) finding (potential) translations of an utterance in the corpus and 2) subsequently translating these translations back into the original language. The set of results of the second step are the potential paraphrases. Even though the basic model produces nice results, we show that a few simple constraints on the basic model reduce the probability of most of the noisy candidates to such an extent that a simple threshold becomes very effective in removing noisy candidates, while retaining a wide variety of good paraphrases. Similar methods have been proposed in the Machine Translation literature, but we improve on those methods by exploiting the multilingual nature of the corpus. Cross-checking over languages allow us to formulate consistency constraints, which prove to be very effective.

The method is characterized by the fact that it is: Unsupervised: we don't need manually annotated data in order to train a model to produce the results. Applicable to any common dialogue utterance: even though we focused on the queries that were used in a previous stage of the project, we showed that with minimal effort additional queries can be handled as well. Not restricted to English: this subtitle corpus makes it possible to formulate queries in different languages and produce paraphrases for these languages. Potentially applicable to a wider range of paraphrasing problems. Therefore it might well be of interest to researchers outside the dialogue field (e.g. machine translation).

Rob Gaizauskas (Sheffield) 21 April 2009
Mining Information from Clinical Records: Information Extraction in the CLEF Project

The Clinical E-Science Framework (CLEF) project was a 5-year MRC-funded e-science project whose technical objective was to explore how advanced information technologies could be used to capture, integrate and present electronic patient information in the domain of cancer treatment within a secure and ethical framework, so as to support clinical research and improve patient care. One important strand of work within CLEF was the automatic extraction of the rich and relatively untapped clinical information in the textual component of the patient record: from radiology reports, histopathology reports and the clinical narratives that are recorded following every patient-doctor consultation. To address this subtask we applied information extraction (IE) technologies. In this talk I give a general overview of the approach to IE taken in the project, addressing: the creation of a rich, semantically annotated corpus of clinical documents; the implementation and evaluation of supervised learning techniques for entity and relation extraction for a range of clinical entity and relation types; initial steps towards using the temporal information in clinical texts to assist in aligning the clinical events mentioned within those texts with mentions of the same events in the structured data component of the patient record. Taken in sum, the CLEF IE activities represent perhaps the most ambitious clinical text mining effort to date.

Stephen Clark (Cambridge) 19 March 2009
Parser Adaptation and Evaluation

Accuracy scores of over 92% are now being reported for statistical parsers trained and tested on the WSJ sections of the Penn Treebank (PTB). However, it is well known that the performance of PTB parsers degrades significantly when applied to other domains, eg biomedical research papers. In addition, the use of the Parseval metrics as a general measure of parser accuracy has been called into question.

In this talk I will investigate both questions of parser adaptation and evaluation, in the context of a statistical parser based on Combinatory Categorial Grammar (CCG). For the adaptation case, I will show that a simple technique of retraining parser components at lower levels of representation -- in this case POS tags and CCG supertags -- leads to a surprisingly accurate parser for biomedical text. For the evaluation case, I will describe a new test suite consisting of manually annotated unbounded dependencies, for a variety of grammatical constructions. The CCG parser's performance on recovering such dependencies compares well with other off-the-shelf parsers, but still leaves much room for improvement. I will motivate the need for such an evaluation despite the relatively low frequencies of unbounded dependencies in naturally occurring text.

Eric Kow (Brighton) 12 March 2009
System Building Cost vs. Output Quality in Data-To-Text Generation

Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality? In this talk, we address the cost/quality trade-off in generation system building. We compare four new data-to-text systems which were created by predominantly automatic techniques against six existing systems for the same domain which were created by predominantly manual techniques. We evaluate the ten systems using intrinsic automatic metrics and human quality ratings. We find that increasing the degree to which system building is automated does not necessarily result in a reduction in output quality. We find furthermore that standard automatic evaluation metrics under-estimate the quality of handcrafted systems and over-estimate the quality of automatically created systems.

[Practice talk for ENLG 2009, with Anja Belz]

Benat Zapirain (Basque Country University) 11 December 2008
An Introduction to Semantic Role Labeling

Semantic Role Labeling (SRL) is the problem of analyzing clause predicates in open text by identifying arguments and tagging them with semantic labels indicating the role they play with respect to the verb. Such sentence-level semantic analysis allows to determine "who" did "what" to "whom", "when" and "where", and, thus, characterize the participants and properties of the events established by the predicates. This kind of semantic analysis is very interesting for a broad spectrum of NLP applications (information extraction, summarization, question answering, machine translation, etc.), and has raised a lot of interest by the NLP community in the last years. In this seminar, I will present an introduction to the task and current challenges.

Taras Zagibalov (Sussex) 9 December 2008
Almost-Unsupervised Cross-Language Opinion Analysis at NTCIR-7

I will describe the Sussex NLCL System entered in the NTCIR-7 Multilingual Opinion Analysis Task (MOAT). The main focus of this work is on the problem of portability of natural language processing systems across languages. The system was the only one entered for all four of the MOAT languages, Japanese, English, and Simplified and Traditional Chinese. The system uses an almost-unsupervised approach applied to two of the sub-tasks: opinionated sentence detection and topic relevance detection.

Theresa Wilson (Edinburgh) 6 November 2008
Fine-Grained Sentiment Analysis in Text and Multi-Party Conversation

The past several years have seen a huge growth in research on identifying and characterizing opinions and sentiments in text. While much of this work has focused on classifying the sentiment of documents, a more fine-grained analysis at the sentence level and below is needed for any application that seeks detailed opinion information, e.g., opinion question answering. In this talk, I will present work on fine-grained sentiment analysis in both text and multi-party conversation. The approaches taken are quite different in terms of the features explored. For text, a wide range of linguistically motivated features are employed for determining when instances of polarity terms are indeed being used to express positive and negative sentiments in context. The results of this disambiguation are then used in determining sentence polarity. For sentiment analysis in speech, the focus is on exploiting very shallow linguistic features, such as n-grams of characters and phonemes, for classifying the subjectivity and sentiment of utterances.

Joakim Nivre (Uppsala University, Sweden) 2 October 2008
Sorting Out Dependency Parsing

The first part of the talk introduces the transition-based approach to data-driven dependency parsing, where inference is performed as a greedy best-first search over a non-deterministic transition system, while learning is reduced to the simple classification problem of mapping each parser state to the correct transition out of that state. The second part of the talk explores the idea that non-projective dependency parsing can be conceived as the outcome of two interleaved processes, one that sorts the words of a sentence into a canonical order, and one that performs strictly projective dependency parsing on the sorted input. Based on this idea, a parsing algorithm is constructed by combining an online sorting algorithm with a transition system for projective dependency parsing.

Sanaz Jabbari (Sheffield) 18 September 2008
A Probabilistic Model of Word Usage Applied to the Lexical Substitution Task

The talk addresses the English Lexical Substitution Task: given a target word and the sentence in which it appears, the task is to find candidate word(s) which could be used in place of the target word without altering the sentence's meaning. I will present a method for performing this task, which incorporates primitive notions both of whether the candidate word matches the target word's semantics, and also whether the candidate word is grammatically coherent in the target word's place. Whilst approaches to this task thus far have largely concentrated on one or another of these concerns, the method we present shows that using a full probability model defined over random variables designed to represent both aspects of the sentence is both achievable and demonstrates significant improvement over either of the aspects alone. I suggest that, coupled with an optimal strategy for determining the set of candidate words, the technique would achieve state-of-the-art performance.

Mona Diab (Columbia University, USA) 26 August 2008
Arabic Semantic Role Labeling Using Kernel Methods

There is a widely held belief in the natural language and computational linguistics communities that identifying and defining roles of predicate arguments in a sentence has a lot of potential for and Semantic Role Labeling (SRL) is a significant step toward improving important applications, e.g. question answering and information extraction. Despite SRL Systems have been largely studied for English, a long path has still to be done to design an satisfying system for Arabic. In this talk, I will present an SRL system for Modern Standard Arabic that exploits many aspects of the rich morphological features of the language. The experiments on the pilot Arabic Propbank data shows that our system based on Support Vector Machines and Kernel Methods yields a global SRL F1 score of 82.17, which improves the current state-of-the-art in Arabic SRL. In the process I will introduce features of the Arabic language that are relevant for automatic processing in general and to the task of SRL In particular. I will also describe the Arabic propbank highlighting how different it is than the English Propbank.

Taras Zagibalov (Sussex) 31 July 2008
Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Text

I will describe a new method of automatic seed word selection for unsupervised sentiment classification of product reviews in Chinese. The whole method is unsupervised and does not require any annotated training data; it only requires information about commonly occurring negations and adverbials. Unsupervised techniques are promising for this task since they avoid problems of domain-dependency typically associated with supervised methods. The results obtained are close to those of supervised classifiers and sometimes better, up to an F1 of 92%.

[Practice talk for COLING 2008]

David Hardcastle (Open University) 5 June 2008
Inferring Semantic Collocates From Corpora

The background to this research is a system, called ENIGMA, which generates cryptic crossword clues based on wordplay puzzles. A clue is typically a short clause, with some ellipsis, which presents a wordplay puzzle to the reader, such as an anagram, when interpreted symbolically following a set of conventions. The symbolic reading is disguised by the fact that the clue appears to be, in the words of Azed, "a piece of English prose". A given puzzle for a particular word can be rendered symbolically in many ways, usually between 10^7 and 10^14, and only a very small fraction of these renderings will also happen to appear to be meaningful fragments of English. ENIGMA explores this search space using syntactic and semantic constraints as heuristics and returns the renderings which, it is hoped, will appear to be grammatical and meaningful.

Building the data sources behind the semantic constraints raises challenging research questions. Existing data sets describing selectional constraints, such as VerbNet, only contain a small number of very broad semantic classes, whereas manually curated resources are narrow and are time-consuming to construct. In this talk I describe the process of extracting and evaluating two data sources from the British National Corpus. The first determines the strength and character of the thematic association between pairs of words, the second defines the domain and/or range of a small set of syntactic dependencies. The thematic association algorithm is based on word distance in the corpus measured over a set of concentric windows ranging from +-1 to +-1000 words in size. For a given pair the system returns a boolean result indicating whether or not a thematic association is implied by the data in the corpus, and if an association is implied whether the context is most commonly at word boundary, at phrase or at document level. The second data set was constructed by running a statistical parser over the BNC and generalizing the domain/range of each dependency over WordNet. The raw output from the corpus was disambiguated using the WordNet lexicographer file numbers as stand-in domains and generalized using a cautious mixture of arc distance and coverage.

Both data sources are designed to provide information about plausible rather than prototypical collocations, and this poses some awkward problems for evaluation. The process of generalizing the syntactic dependency data also highlighted some common difficulties in working with corpus data, such as polysemy, figurative language, synecdoche etc, and illustrated the shortcomings of using a static one-dimensional hierarchy such as WordNet to explore a wide range of different interactions between words when each interaction is based on a different subset of the features of the underlying concept.

Carlos Gomez (Corunna, Spain) 22 May 2008
A Deductive Approach to Dependency Parsing

I will define a new formalism, based on Sikkel's parsing schemata for constituency parsers, that can be used to describe, analyze and compare dependency parsing algorithms. This abstraction allows us to establish clear relations between several existing projective dependency parsers, explore their formal properties and automatically derive efficient implementations for them.

[Practice talk for ACL 2008]

Montse Cuadros (UPC, Barcelona) 1 May 2008
KnowNet: Building a Large Net of Knowledge from the Web

I will present a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web. The resulting knowledge-base which connects large sets of semantically-related concepts is a major step beyond WordNet: in quantity, KnowNet is several times larger than WordNet and in quality, the knowledge contained in KnowNet outperforms any other automatically derived semantic resource when empirically evaluated in a common framework.

Adam Kilgarriff (Lexicography MasterClass Ltd) 24 April 2008
Semi-automatic Dictionary Drafting, Good Dictionary Examples, Evaluating Word Sketches: Some Recent Developments in the Sketch Engine

The Sketch Engine is a general-purpose corpus query tool for linguists, lexicographers and language technologists. Its distinguishing feature is "word sketches", one-page, corpus-driven summaries of a word's grammatical and collocational behaviour. The Sketch Engine web service gives access to very large corpora for many of the world's major languages.

The talk will take the above as given, and will discuss work we have been doing recently on three fronts:

* Semi-automatic dictionary-drafting (SADD): we have implemented the idea behind the WASPS project - integrating lexicography and WSD through assigning collocates to word senses - in the Sketch Engine, and hope to use the method shortly on production scale

* GDEX (Good dictionary examples): we can now score corpus sentences so that the best candidates for dictionary examples get the highest scores. The work builds on a long history of work on 'readability', and has already been applied in two commercial dictionaries. It also has implications for the use of corpora in language teaching.

* Evaluating word sketches: word sketches are now nine years old, and they generally get very positive reviews, but there have not to date been any objective evaluations. We are now setting this right, with a developer perspective for German, and with a user perspective across six languages.

David Milward (Linguamatics Ltd., Cambridge) 7 February 2008
Finding Facts and Relationships from Life Science Literature

Interactive Information Extraction brings together search and information extraction to provide fast, interactive text mining over large volumes of text such as Medline abstracts, full text scientific articles, patents etc. As well as covering the two ends of the spectrum: keyword search over documents, and detailed linguistic patterns within sentences, Linguamatics' I2E also covers the points in between such as keywords within the same sentence, or co-occurrence of biological entities within sentences or documents. In this talk I will describe how I2E is being used in the life sciences, the use of ontologies within the system, and how statistical and linguistic processing can be combined to provide high quality results. I will also show how information discovered in different documents can be combined to discover new, long-distance relationships.

Khalil Simaan (Amsterdam) 17 January 2008
Unsupervised Bidirectional Estimation for Noisy-Channel Models

Shannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech processing. The model factors into two components: a language model to characterize the original message and a channel model to describe the channel's corruptive process. The data for training this model consists of a pair of coprora, one consists of messages and the other of observations (in parallel corpora, the messages and observations are aligned pair-wise).

The standard approach for estimating the parameters of the channel model is unsupervised Maximum-Likelihood of the observation data, usually approximated using the Expectation-Maximization (EM) algorithm. Under the EM algorithm the model parameters are fitted only to data from one side of the channel: The language model parameters depend solely on data from the message-side; and the channel model parameters are chosen to maximize the likelihood of the data from the observable-side of the channel alone. However, the Noisy-Channel model can be formulated in two directions, whereby each time one side of the data serves as the message-side, whereas the other side serves as observation-side. Because of weak language models, asymmetric channel models and sparse-data, the estimation of these two directional models using EM often leads to suboptimal estimates. In this work we show that it is better to maximize the likelihood of the total data *at both ends of the noisy-channel* under a single set of parameters that governs both directional models. In this work we derive a corresponding bi-directional EM algorithm and show that it gives better performance than standard EM on three tasks: (1) word-based translation by estimating a probabilistic lexicon trained on non-parallel corpora, (2) adaptation of a part-of-speech tagger between related languages, and (3) last minute results on word alignment under the IBM models and the commonly used HMM model (Giza++).

Eric Kow (Brighton) 10 January 2008
Surface Realisation: Ambiguity and Determinism

Surface realisation is a subtask of natural language generation. It may be viewed as the inverse of parsing, that is, given a grammar and a representation of meaning, the surface realiser produces a natural language string that is associated by the grammar to the input meaning. Here, we present GenI, a surface realiser for Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG) and three major extensions.

The first extension improves the efficiency of the realiser with respect to lexical ambiguity. It is an adaptation from parsing of the "electrostatic tagging" optimisation, in which lexical items are associated with a set of polarities, and combinations of those items with non-neutral polarities are filtered out.

The second extension deals with the number of outputs returned by the realiser. Normally, the GenI algorithm returns all of the sentences associated with the input logical form. Whilst these inputs can be seen as having the same core meaning, they often convey subtle distinctions in emphasis or style. It is important for generation systems to be able to control these extra factors. Here, we show how the input specification can be augmented with annotations that provide for the fine-grained control that is required. The extension builds off the fact that the FB-LTAG grammar used by the generator was constructed from a "metagrammar", explicitly putting to use the linguistic generalisations that are encoded within.

The final extension provides a means for the realiser to act as a metagrammar-debugging environment. Mistakes in the metagrammar can have widespread consequences for the grammar. Since the realiser can output all strings associated with a semantic input, it can be used to find out what these mistakes are, and crucially, their precise location in the metagrammar.

Christian Pietsch (Open University) 13 December 2007
Recognising Textual Entailment and Paraphrases

In recent years textual entailment has been proposed as a framework for applied semantics that can be used for a wide range of real-world natural language processing applications. A series of PASCAL Challenges (RTE 1, 2, and 3) produced prototype systems that typically used a variety of (sometimes multiple) lexical semantic resources, many of which are available only for the English language.

I will give an overview of existing approaches and then describe in more detail the system devised by Gaston Burek and myself at the Open University. Our RTE-3 challenge entry differed from most other systems in that it relied exclusively on a shallowly parsed corpus as its sole resource. From this corpus, a fully automatic process based on Latent Semantic Analysis (LSA) extracted not only lexical semantic knowledge but also a certain amount of world knowledge -- something that is beyond the scope of most lexical knowledge bases, hand-crafted or automatically acquired. We expect that our approach could be a valuable component of many systems for recognising textual entailment -- even more so when systems are built for under-resourced languages.

To our knowledge, this is the first time that Latent Semantic Analysis does not operate on `bags of words' but on semi-structured (subject-verb-object) representations. Thus, in contrast to other LSA-based systems, our system will not claim that `Peter loves Mary' has the same meaning as `Mary loves Peter'.

Taras Zagibalov (Sussex) 29 November 2007
Unsupervised Classification of Sentiment and Objectivity in Chinese Text

We address the problem of sentiment and objectivity classification of product reviews in Chinese. Our approach is distinctive in that it treats both positive / negative sentiment and subjectivity / objectivity not as distinct classes but rather as a continuum; we argue that this is desirable from the perspective of would-be customers who read the reviews. We use novel unsupervised techniques, including a one-word 'seed' vocabulary and iterative retraining for sentiment processing, and a criterion of 'sentiment density' for determining the extent to which a document is opinionated. The classifier achieves up to 87% F-measure for sentiment polarity detection.

[Practice talk for IJCNLP 2008]

Massimiliano Ciaramita (Yahoo! Research, Barcelona) 22 November 2007
Domain Adaptation in Named Entity Recognition

Is natural language technology adequate for applications, e.g., in Web technology? Notwithstanding periodic surges in expectations, there is still no clear evidence supporting such claims. On the other hand, it is easy to verify that even the best NLP tools make many mistakes when deployed on real-world tasks. Domain adaptation deals with the problem of adapting existing systems (parsers, taggers, etc.) to new domains in the absence of (manually) annotated data in the new domain. Research in this area might be crucial to help NLP improve robustness and quality. In this stalk I will first present an overview of recent findings in domain adaptation. Then I will discuss our own ongoing research, mainly in the task of named entity recognition, involving both machine learning and knowledge based approaches.

Carlos Gomez (Corunna, Spain) 11 October 2007
Prototyping Parsers by Compiling Parsing Schemata

The parsing schemata formalism allows us to describe a wide variety of parsing algorithms in a simple, declarative way, by capturing their fundamental semantics while abstracting low-level detail. This talk will introduce the formalism and present a system that can be used to automatically transform parsing schemata into efficient implementations of their corresponding parsers. This system can be employed to test the relative performance of different parsing strategies in a particular grammar or domain without worrying about implementation details. The system has been used to analyze and compare the performance of different parsers for context-free grammars and tree-adjoining grammars. Additionally, the presentation will discuss the possibility of using parsing schemata to represent dependency parsers.

Chris Biemann (Leipzig, Germany) 5 September 2007
Unsupervised and Knowledge-Free Natural Language Processing in the Structural Discovery Paradigm

In the past, language processing has predominantely been performed by using either explicit rule-based knowledge or implicit knowledge via learning from annotations. In contrast to this, I introduce the Structure Discovery paradigm. This is a framework for learning structural regularities from large samples of text data, and for making these regularities explicit by introducing them in the data via self-annotation.

Working in this paradigm means to set up discovery procedures that operate on raw language material and iteratively enrich the data by using the annotations of previously applied Structure Discovery processes.

Since graph representations are an intuitive way for encoding linguistic entities and their relations in nodes and edges, I will take about some graph characteristics typically found in representations of language data. To perform necessary abstractions and generalisations needed for Structure Discovery, I introduce the Chinese Whispers Graph Clustering algorithm. This algorithm is very efficient and allows to partition graphs with millions of nodes in a short time.

Then I will present some practical applications following the Structure Discovery paradigm: A solution for language separation, an unsupervised PoS tagger and a word sense induction system.

If time allows, I will talk about possible further work, especially regarding emergent language generation models that reproduce characterstics found by Structure Discovery processes.

Jonathon Read (Sussex) 21 June 2007
Annotating Expressions of Appraisal in English

The Appraisal framework is a theory of the language of evaluation, developed within the tradition of systemic functional linguistics. The framework describes a taxonomy of the types of language used to convey evaluation and position oneself with respect to the evaluations of other people. Accurate automatic recognition of these types of language can inform an analysis of document sentiment. This paper describes the preparation of test data for algorithms for automatic Appraisal analysis. The difficulty of the task is assessed by way of an inter-annotator agreement study, based on measures analogous to those used in the MUC-7 evaluation.

[Joint work with David Hope and John Carroll. Practice for ACL Linguistic Annotation Workshop Talk]

Ryu Iida (Nara Institute of Science and Technology, Japan) 14 June 2007
Combining Linguistic Knowledge and Machine Learning for Anaphora Resolution

Anaphora resolution, which is the process of identifying whether or not an expression refers to another expression, is an important process for various NLP applications. In contrast to rule-based approaches, empirical or corpus-based approaches to this problem have been shown to be a cost-efficient solution achieving a performance that is comparable to the best performing rule-based systems. Aanphora resolution can be decomposed into two subtasks: antecedent identification, which is the process to identify an antecedent for a given anaphor, and anaphoricity determination, which is the process to judge whether or not a candidate anaphor is anaphoric.

In the first half of the talk, I will present an antecedent identification model, named `tournament model', which captures contextual information that is more sophisticated than what is offered in Centering Theory (Grosz et al., 95). Our experiments show that this model significantly outperforms earlier machine learning-based approaches, such as Soon et al. (2001).

In the second half of the talk, I will present an anaphoricity determination model, named `selection-then-classication model', a process that reverses the order of the steps in the classication-then-search model proposed by Ng and Cardie (2002), inheriting all the advantages of that model. I conducted experiments on resolving noun phrase anaphora in Japanese. The results show that with the selection-then-classication based modifications, the proposed model outperforms earlier learning-based approaches.

Anja Belz (Brighton) 12 June 2007
ENLG 07 Practise Talks

We have some practise talks scheduled for 12th June at 11:30am in the Chichester 1 NLP meeting room (Ch1 011) . Anja Belz is giving two practice talks for ENLG 07, see the abstracts below. There will hopefully also be a bonus ACL demo feature from Adam Kilgarriff. We anticipate that the 3 talks will each last 20 mins approx + time for questions.

1. Anja Belz, Albert Gatt, Ehud Reiter and Jette Viethen: The First NLG Shared-Task Challenge: Attribute Selection for Referring Expressions Generation, ENLG'07, 18/06

The field of Natural Language Generation (NLG) has strong evaluation traditions, in particular in user-based evaluation of applied systems. However, while in most other NLP fields shared-task evaluation now plays an important role, there are few results of this kind in NLG. The Shared Task Evaluation Campaign (STEC) in Generation of Referring Expressions (GRE) is intended to be a first step in the direction of exploring what is required for shared-task evaluation in NLG. Under the umbrella of this GRE STEC, we are planning to organise a series of evaluation events, involving, over time, a wide range of GRE task definitions, data resources and evaluation methods.

As a first step, and in order to gauge community interest, we are setting up a pilot evaluation in the spirit of a feasibility test: the Attribute Selection for Referring Expressions Generation Challenge. This Challenge will be presented and discussed at this year's UCNLG+MT Workshop in Copenhagen, on 11 September, at MT Summit XI. If successful, we plan to organise a larger-scale event in 2008, extending the remit to cover aspects of GRE beyond attribute selection as well as more data resources and evaluation methods.

This talk will give a brief overview of the shared task itself, organisation of the event and evaluation of the results.

2. Anja Belz and Sebastian Varges: Generation of Repeated References to Discourse Entities, ENLG'07, 18/06

Generation of Referring Expressions is a thriving subfield of Natural Language Generation which has traditionally focused on the task of selecting a set of attributes that unambiguously identify a given referent. In this paper, we address the complementary problem of generating repeated, potentially different referential expressions that refer to the same entity in the context of a piece of discourse longer than a sentence. We describe a corpus of short encyclopaedic texts we have compiled and annotated for reference to the main subject of the text, and report results for our experiments in which we set human subjects and automatic methods the task of selecting a referential expression from a wide range of choices in a full-text context. We find that our human subjects agree on choice of expression to a considerable degree, with three identical expressions selected in 50% of cases. We tested automatic selection strategies based on most frequent choice heuristics, involving different combinations of information about syntactic MSR type and domain type. We find that more information generally produces better results, achieving a best overall test set accuracy of 53.9% when both syntactic MSR type and domain type are known.

Ben Medlock (Cambridge) 31 May 2007
Weakly Supervised Learning for Hedge Classification in Scientific Literature

We investigate automatic classification of speculative language, or `hedging', in scientific literature from the biomedical domain using weakly-supervised learning. We discuss the task from both a human annotation and machine learning perspective and focus on aspects of the problem that set it apart from previous weakly-supervised ML research. We show how the problem can be tackled with a probabilistic formulation of the self-training paradigm, and present a theoretical and practical evaluation of our learning and classification models.

Diana McCarthy (Sussex) 21 May 2007
Evaluating Automatic Approaches for Word Meaning Discovery and Disambigation using Lexical Substitution

There has been a surge of interest in Computational Linguistics in word sense disambiguation (WSD). A major catalyst has been the SENSEVAL evaluation exercises which have provided standard datasets for the field over the past decade. Whilst researchers believe that WSD will ultimately prove useful for applications which need some degree of semantic interpretation, the jury is still out on this point. One significant problem is that there is no clear choice of inventory for any given task, other than the use of a parallel corpus for a specific language pair for a machine translation application. Most of the datasets produced, certainly in English, have used WordNet. Whilst WordNet is a wonderful resource it would be beneficial if systems using other inventories could enter the WSD arena without the need for mappings between the inventories which may mask results. As well as the work in disambiguation, there is a growing interest in automatic acquisition of inventories of word meaning. It would be useful to investigate the merits of predefined inventories themselves, aside from their use for disambiguation, and compare automatic methods of acquring inventories. In this talk I will discuss these issues and some results in the context of the English Lexical Substitution Task, organised by myself and Roberto Navigli (University of Rome, ``La Sapienza'') earlier this year under the auspices of SEMEVAL.

[Practice for invited talk at NODALIDA 2007]

Rada Mihalcea (North Texas) 17 May 2007
Measures of Text Semantic Similarity

Measures of text similarity have been used for a long time in a variety of applications, including information retrieval, text classification, word sense disambiguation, extractive summarization, and more recently in automatic evaluation of machine translation and text summarization. With a few exceptions, the typical approach to finding the similarity between two text segments is to use a simple lexical matching method, and produce a similarity score based on the number of lexical units that occur in both input segments (usually referred to as the 'vectorial model'). While successful to a certain degree, these lexical similarity methods cannot always identify the semantic similarity of texts. For instance, there is an obvious similarity between the text segments ``I own a dog'' and ``I have an animal,'' but most of the current text similarity metrics will fail to identify any kind of connection between these texts.

In this talk, I will describe our work in developing methods for measuring the semantic similarity of texts using corpus-based and knowledge-based measures of similarity. Given that a large fraction of the information available today, on the Web or elsewhere, consists of short text snippets (e.g.\ abstracts of scientific documents, imagine captions, product descriptions), in this work we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method outperforms methods based on simple lexical matching, resulting in significant error rate reductions with respect to the traditional vector-based similarity metric.

Mark Rogers (Market Sentinel) 26 April 2007
Technology for Blog and Web Monitoring Services

Market Sentinel was founded in September 2004 using the technology behind the popular website Its first customers were in the technology sectors; it now operates in the UK, US and Europe, with customers in the automotive, pharmaceutical, internet, telecommunications and financial sectors. Mark is the CEO of Market Sentinel. He will talk about

  • measuring online conversations: methodology and business models
  • new developments in automating this: their strengths and weaknesses

and then open the floor for discussion.

Diana McCarthy (Sussex) 17 April 2007
Using Selectional Preferences to Detect Non-Compositionality of Verb-Object Combinations

Automatic methods to detect the non-compositionality of multiwords have attracted attention in recent years because of the importance of this for semantic interpretation. There have been various approaches to capturing non-compositionality, many using distributional similarity to contrast a target multiword with its constituents. We will describe our work exploring the use of selectional preferences for detecting non-compositional verb-object multiwords. To characterise the arguments in a given grammatical relationship we experiment with three models of selectional preference. Two use WordNet and one uses the entries from a distributional thesaurus as classes for representation. In previous work on selectional preference acquisition, the classes used for representation are selected according to the coverage of argument tokens. For both the distributional thesaurus model and one of the WordNet models we select classes for representing the preferences by virtue of the number of argument types that they cover, rather than the number of tokens. Then, only tokens under the classes which are representative of the argument head data are used to estimate the probability distribution for the selectional preference model. We demonstrate a highly significant correlation between measures which use these `type-based' selectional preferences and compositionality judgements from a data set used in previous research. The type-based models perform better than the models which use tokens for selecting the classes. Furthermore, the models which use the automatically acquired thesaurus entries produced the best results. The correlation for the thesaurus models is stronger than any of the individual features used in previous research on the same dataset.

[Practice talk for a seminar in Groningen]

Rob Koeling (Sussex) 15 February 2007
Text Categorization for Improved Priors of Word Meaning

Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant (most frequent) sense of a word when contextual clues are not strong enough. The topic domain of a document has a strong influence on the sense distribution of words. Unfortunately, it is not feasible to produce large manually sense-annotated corpora for every domain of interest. Previous experiments have shown that unsupervised estimation of the predominant sense of certain words using corpora whose domain has been determined by hand outperforms estimates based on domain-independent text for a subset of words and even outperforms the estimates based on counting occurrences in an annotated corpus. In this paper we address the question of whether we can automatically produce domain-specific corpora which could be used to acquire predominant senses appropriate for specific domains. We collect the corpora by automatically classifying documents from a very large corpus of newswire text. Using these corpora we estimate the predominant sense of words for each domain. Encouraged by initial results of doing this we start exploring using text categorization for WSD by evaluating on a standard data set (documents from the SENSEVAL-2 and 3 English all-word tasks). We show that for these documents and using domain-specific predominant senses, we are able to improve on the results that we obtained with predominant senses estimated using general, non domain-specific text. We also show that the confidence of the text classifier is a good indication whether it is worthwhile using the domain-specific predominant sense or not.

[Dry run for CICLing talk]

Francis Chantree (Open University) 1 February 2007
Nocuous Ambiguities in Requirements Specifications

In this talk we present research that addresses problems caused by ambiguity in requirements. Ambiguity is pervasive in natural language. In requirements engineering it is often considered together with concepts such as incompleteness and inconsistency as a major concern. This is because it can lead to misuderstandings, and therefore to incorrect implementations. Despite the development of formal specification methods, most requirements documents are still written in everyday natural language. Misunderstandings can therefore occur between stakeholders in the requirements engineering process. The worst scenario is when they interpret a requirement in one way but do not realise that other interpretations are possible. Misunderstandings are therefore carried forward into other stages of the software lifecycle.

We aim to determine which ambiguities may lead to misunderstandings - we term these ``nocuous ambiguities''. Authors and readers of requirements should be notified about these, and ideally they should be rewritten in a less ambiguous form. Other ambiguities can be allowed to remain in text. We consider that human perception is the only way ambiguity can be judged. It is this that determines whether an ambiguity is nocuous. We have therefore surveyed human perceptions about a set of ambiguities taken from requirements documents. We then attempt to automatically predict these perceptions, using a set of heurustics we have developed based on various types of linguistic data. These determine which ambiguities are nocuous. They achieve varying degrees of success, and can be used in combination to give good precision and recall. Our model of ambiguity classification, together with our method of implementing it, is a novel approach to the problem.

Craig McMillan (Trampoline Systems) 11 January 2007
Language Processing and Social Network Analysis in SONAR, an Enterprise Information Management Application

Large archives are difficult to navigate, and the easily extractable navigation dimensions often don't mesh very well with how we would like to navigate. Natural language processing and social network analysis can be used to provide more natural means of navigating an archive, and in the context of a growing archive, of providing notifications of interesting additions to the archive. SONAR is an application to realise these techniques for email and document archives.

Xinglong Wang (Edinburgh) 7 December 2006
Rule-based Protein Term Identification with Help from Automatic Species Tagging

In biomedical articles, protein mentions often refer to different protein entities. For example, an arbitrary occurrence of term p53 might denote thousands of proteins across a number of species. A human annotator is able to resolve this ambiguity relatively easily, by looking at its context and if necessary, by searching an appropriate protein database. However, this phenomenon may cause much trouble to a text mining system, which does not understand human languages and hence can not identify the correct protein that the term refers to. In this paper, we present a Term Identification system which automatically assigns unique identifiers, as found in a protein database, to ambiguous protein mentions in texts. Unlike other solutions reported in the literature, which work on gene/protein mentions in a specific model organism, our system is able to tackle protein mentions across many species, by integrating a machine-learning based species tagger. We have compared performance of our automatic system to that of human annotators, with very promising results.

Roberto Navigli (Rome) 30 November 2006
Dealing with the Complexities of Sense Granularity: Fine-grained Validation and Sense Clustering

Word Sense Disambiguation (WSD) is the task of computationally determining the appropriate meaning of words in context. Most approaches to WSD adopt WordNet (Fellbaum, 1998) as a reference sense inventory. Unfortunately, WordNet encodes very fine-grained sense distinctions which make it hard for WSD systems to exceed a 65\% accuracy in an all-words setting. In this talk, we propose two different approaches to this problem. First, we accept to deal with the fineness of the WordNet sense inventory. We propose a method to adjudicate the disagreements between sense annotators based on the exploitation of the lexicon structure as a justification of the final sense choices. The method allows to adjudicate both manual and automatic disagreements and attains 68.5\% accuracy in the validation of the 3 best-ranking systems in the Senseval-3 all-words task. Second, we present an approach to the clustering of WordNet word senses via a mapping to coarser sense distinctions from a machine-readable edition of the Oxford Dictionary of English (ODE). We show that the resulting clustering is reliable and that state-of-the-art systems achieve up to 78\% accuracy when adopting the resulting sense clustering in the Senseval-3 all-words setting.

Serge Sharoff (Leeds) 10 November 2006
"Irrefragable Answers" to Translation Problems: Researching and Teaching Translation using Comparable Corpora

Words are used in a variety of contexts and cannot be translated in a word-for-word fashion. Frequently a translator understands the source text, but cannot find an expression suitable for rendering it in the target language, for example, ``daunting'' in ``Hospital admission can prove a particularly daunting experience''. In my talk I will discuss the nature of problems with such examples, and present computational solutions for these problems offered by comparable corpora. The model is based on detecting frequent multi-word expressions (MWEs) in the source and target languages and mapping between MWEs by means of finding similarities of their contexts of use.

Alex Clark (Royal Holloway, London) 2 November 2006
Languages as Hyperplanes: Grammatical Inference with String Kernels

Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context sensitive languages can be represented in this way and it is possible to efficiently learn these using kernel PCA. We present some experiments demonstrating the effectiveness of this approach on some standard examples of context sensitive languages using small synthetic data sets.

[Joint work with Christophe Costa Florencio and Chris Watkins: prize for most Innovative Contribution at ECML 2006]

Diana McCarthy (Sussex) 26 October 2006
Automatic Methods for Detecting the Compositionality of Multiwords

Semantic Compositionality, or rather the lack of it, has received some attention in recent years because of the impact it has on semantic interpretation and thus any NLP system that relies on this. I will survey previous methods for detecting compositionality and describe the various approaches to evaluation. I will discuss the extent to which the techniques for both extraction and evaluation are actually addressing compositionality rather than institutionalisation. I will also describe some of the work done by myself and colleagues at Sussex University in this area and compare results on datasets used by others in the field.

[This talk is in preparation for 'Collocations and Idioms 2006', Nov 2--4 Berlin]

Chris Mellish (Aberdeen) 21 September 2006
Natural Language Directed Inference in the Presentation of Ontologies

Knowledge engineers, domain experts and also casual users need better ways to compare and understand ontologies. As ontologies get more logically complex, the idea of presenting parts of them in natural language is becoming increasingly attractive.

This work takes as its starting point the task of answering in natural language a question ``What is A?'', where A (the target) is an atomic concept mentioned in some given OWL DL ontology. Rather than considering detailed linguistic aspects of this task, however, we focus on how to support it with appropriate reasoning.

A first attempt at the task of answering ``What is A?'' might somehow render in natural language the set of ontology axioms that mention A. However, we show that the raw axioms may well not package information in a way appropriate for natural language presentation. This seems that actually what is required is to present certain logical consequences of the axioms, where these are selected using principles of natural language presentability. Unfortunately, existing description logic reasoning services are not flexible enough for this.

We introduce a method of generating subsumers of a target concept which exploits standard DL reasoning and uses natural language principles to prune the search space. This gives an initial implementation of the idea of ``natural language directed inference''. It also introduces a number of interesting cases that shed light in logical terms on the nature of natural language presentability.

[Joint work with Jeff Pan and Xiantang Sun]

Jonathon Read & David Hope (Sussex) 22 June 2006
An Appraisal Of Appraisal Theory

Appraisal Theory is an extension of M.A.K. Halliday's Systemic Functional Linguistics (SFL) framework. The theory has emerged over a period of almost fifteen years as a result of work conducted by a body of researchers led by Professor James Martin of the Linguistics Department of the University of Sydney.

Appraisal Theory analyses the language of evaluation. Its aim is to classify the linguistic resources -- as applied by interlocutors dialogically -- which are used to express, negotiate, or naturalise particular `positions'. Appraisal Theory thus centres on language that emits an evaluation, attitude, or emotion. It seeks to explain how such language implies a level of engagement that an interlocutor holds with respect to the propositions expounded within internal and/or external dialogues. By varying the particular lexis used, an interlocutor may construe gradable levels of engagement with any particular proposition (`stance') put forth in a text (including their own). This allows an interlocutor to adopt particular value positions; positions which enable them to `negotiate' with actual or potential respondents of the dialogue.

Appraisal Theory delineates, and so subcategorises, three particular types of evaluation: `Affect' is centred on the `emotive'; `Judgement' upon the `moral', with `Appreciation' concerning `aesthetics': effectively, one has an `Ontology of Evaluation' (of sorts). By capturing -- computationally -- the lexicogrammar used for each particular type of appraisal, one may be able to classify stretches of language as of being of one of these particular types. Thus, one may begin to ask questions of the events / objects apparent in texts from different perspectives e.g. how objects / events affect interlocutors; the morals / ethics of the participants involved in the dialogue, and the aesthetical value of entities as portrayed in texts, viz. to know the value of an object without recourse to the implications of its 'ethical price'.

Daoud Clarke (Sussex) 8 June 2006
Meanings Live in a Vector Lattice; Words are Positive Operators

Techniques such as latent semantic analysis and measures of distributional similarity are based around the idea that meanings of words may be represented as elements of a vector space. Such representations often provide clear advantages in many applications, but many questions relating to these representations remain unanswered. How do such representations relate to ontological representations of meaning, and is it possible to combine the two? How are representations of words to be combined into representations of phrases and sentences, and what role may syntax play in this? How can notions such as entailment and contradiction be described between such representations?

In this talk I will motivate the use of algebra as a means towards answering these questions. Several observations point towards a particular area of mathematics that has been neglected in the computational linguistics literature -- that of vector lattices -- structures that combine a vector nature with a lattice structure. Our thesis is that meanings may be represented as 'positive' elements of such a structure, and words may then be considered as positive operators on a vector lattice.

Sabine Schulte im Walde (Saarland University, Germany) 18 May 2006
Can Human Verb Associations Help Identify Salient Features for Semantic Verb Classes?

Different frameworks of semantic verb classes depend on different instantiations of semantic similarity, e.g. Levin relies on verb similarity referring to syntax-semantic alternation behaviour, WordNet uses synonymy, and FrameNet relies on situation-based agreement as defined in Fillmore's frame semantics (Fillmore, 1982). As an alternative to a resource-intensive manual classifications, automatic methods such as classification and clustering are applied to induce verb classes from corpus data, e.g. Schulte im Walde (2000), Merlo and Stevenson (2001), Joanis and Stevenson (2003), Korhonen, Krymolowski and Marx (2003), Stevenson and Joanis (2003), Schulte im Walde (2003).

The verb feature selection on which an automatic classification relies should model the similarity of interest. However, in larger-scale classifications which model verb classes with similarity at the syntax-semantics interface, it is not clear which features are the most salient. The verb features need to relate to a behavioural component (modelling the syntax-semantics interplay), but the set of features which potentially influence the behaviour is large, ranging from structural syntactic descriptions and argument role fillers to adverbial adjuncts. In addition, it is not clear how fine-grained the features should be; for example, how much information is covered by low-level window co-occurrence vs. higher-order syntactic frame fillers?

In this talk, I investigate whether human associations to verbs as collected in a web experiment can help us to identify salient verb features for semantic verb classes. Assuming that the associations model aspects of verb meaning, we apply an unsupervised clustering to the verbs, as based on the associations, and validate the resulting verb classes against standard approaches to semantic verb classes, i.e.\ GermaNet and FrameNet. Then, various clusterings of the same verbs are performed on the basis of standard corpus-based types, and evaluated against the association-based clustering as well as GermaNet and FrameNet classes. We hypothesise that the corpus-based clusterings are better if the instantiations of the feature types show more overlap with the verb associations, and that the associations therefore help to identify salient feature types.

Sabine Buchholz (Toshiba Research Europe, Cambridge) 12 May 2006
The CoNLL-X Shared Task on Multi-lingual Dependency Parsing

Each year the Conference on Computational Natural Language Learning (CoNLL) features a 'shared task', in which participants train and test machine learning systems on exactly the same data sets, in order to better compare the systems. The topic for the shared task in this year's CoNLL (the tenth such conference, hence CoNLL-X) is multi-lingual dependency parsing.

Dependency structure is an alternative to constituent structure for representing syntactic analyses of sentences and is said to be particularly suited for freer word order languages. During the last decade, much progress has been made not only in constituent but also in dependency parsing and with the emergence of treebanks for various languages, both types of parsers have increasingly been applied to languages other than English. The shared task continues this line of research.

I am one of the organizers of the shared task and will describe how we converted treebanks for 13 different languages (Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish) into a common data format, what parsing approaches have been taken by participants, how parser performance is evaluated, what results were achieved and what we can learn from them about the approaches and the problem of multi-lingual dependency parsing itself.

More information about the shared task is available at the CoNLL-X Shared Task website.

Klara Chvatalova (Charles University, Prague) 27 April 2006
Syntactic-Semantic Analysis of Selected Prepositional Groups

We selected a prepositional group in Czech consisting of the preposition za and of a noun in an accusative form as an example to deal with the task of distinguishing between different adverbial syntactical functions expressed by a morphologically homonymous form. First, we demonstrated the multifunctionality of the analysed form on some motivation examples and summed up the description of the form in the most substantial syntaxes of Czech language and in other related works. We realised that the typology of relevant adverbials is inconsistent in many points. Secondly, we acquired a data set of 1591 occurrences of the selected form from the Prague Dependency Treebank and we used 524 of them for our detailed analysis. We compared our observations with the information collected from the related works and we described twelve adverbial types with different meanings presented by the form za + accusative. Providing as accurate definitions as possible, we made an effort to eliminate existing inconsistency in the adverbial typology. We suggested establishing new adverbial types in some cases -- adverbial of representation, adverbial of countervalue -- or to separate a new subtype -- adverbial of compromise. We also suggested some modifications of the temporal adverbial subtypes system. Finally, we formulated automatic, semi-automatic and manual criteria for recognition of all the described adverbial types and subtypes excluding the adverbial of extent and the adverbial of regard represented in our data set only by a small amount of occurrences. Applying the criteria, we were able to recognise the correct adverbial type for 92.18% of 524 occurrences in total in the analysed data set and for 86.76% of 524 occurrences in total in our training data set.

Roberto Navigli (Rome) 9 March 2006
Structural Semantic Interconnections: a Knowledge-Based Approach to WSD

In this talk we will present an algorithm, called Structural Semantic Interconnections (SSI), which creates structural specifications of the possible senses for each word in a context, and selects the best hypothesis according to a pattern grammar, describing relations between sense specifications. Sense specifications are created in the form of semantic graphs from several available lexical resources, that we integrated in part manually, in part with the help of automatic procedures. We present a fast implementation of SSI, about to be released online, and we discuss the experiments performed on different semantic disambiguation problems, like automatic ontology population, disambiguation of sentences in generic texts, disambiguation of words in glossary definitions. Finally we show the potential of semantic graphs for the validation of manual and automatic sense annotations.

Mirella Lapata (Edinburgh) 19 January 2006
Constructing Semantic Space Models from Parsed Corpora

Traditional vector-based models use word co-occurrence counts from large corpora to represent lexical meaning. In this talk we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalisation for this class of models which allows linguistic knowledge to guide the construction of semantic spaces. We evaluate our framework on tasks that are relevant for cognitive science and NLP: semantic priming, lexicon acquisition and word sense disambiguation. In all cases we show that our framework obtains results that are comparable or superior to state-of-the art.

Adam Kilgarriff (Lexicography MasterClass Ltd) 15 December 2005
Computational Corpus-driven Linguistics: a Research Programme

This is an exciting time for our understanding of language. Linguists are becoming familiar with corpora, and so the possibilities they offer are now beginning to open up. Language-processing tools like part-of-speech taggers are also now reaching a level of maturity, so we can work with corpora that handle lemmas and grammar, and potentially more, as well as simple word forms. In this talk I will sketch out the empiricist programme, illustrating it with 'word sketches', one-page summaries of a word's grammatical and collocational behaviour, and distributional thesauruses.

Gerold Schneider (Zurich, Switzerland) 10 November 2005
Low-complexity, Deep-linguistic Parsing

I will present a robust, deep-linguistic parser that combines a hand-written grammar with statistical disambiguation and that keeps search spaces very small. This is achieved by treating most long-distance dependencies in a context-free way, by implementing mild context-sensitivity, by integrating tagging and chunking, and by using functional Dependency Grammar. I will also discuss the close similarities to Lexical-Functional Grammar and Tree Adjoining Grammar. An evaluation of the parser shows that its performance is competitive with the state of the art.

Muntsa Padro (UPC Barcelona) 13 October 2005
Applying Causal-State Splitting Reconstruction Algorithm to Natural Language Processing Tasks

The Causal-State Splitting Reconstruction algorithm learns a finite state automaton from data sequences. In the work I will present, this algorithm is applied to NLP tasks, namely Named Entity Recognition and Chunking. The obtained results are slightly below the best state-of-the-arts system, but can be considered competitive, and given the simplicity of our approach, they are really promising.

Once the viability of using this algorithm for these NLP tasks is established, we plan to improve the results obtained at NER and Chunking by using more features, and also to study more sophisticated ways to use this algorithm in this kind of NLP tasks.

Rob Koeling (Sussex) 30 September 2005
Domain-Specific Sense Distributions and Predominant Sense Acquisition

Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant sense of a word when contextual clues are not strong enough. The domain of a document has a strong influence on the sense distribution of words, but it is not feasible to produce large manually annotated corpora for every domain of interest. In this paper we describe the construction of three sense annotated corpora in different domains for a sample of English words. We apply an existing method for acquiring predominant sense information automatically from raw text, and for our sample demonstrate that (1)~acquiring such information automatically from a mixed-domain corpus is more accurate than deriving it from SemCor, and (2)~acquiring it automatically from text in the same domain as the target domain performs best by a large margin. We also show that for an all words WSD task this automatic method is best focussed on words that are salient to the domain, and on words with a different acquired predominant sense in that domain compared to that acquired from a balanced corpus.

Rebecca Watson (Cambridge) 29 September 2005
Efficient Extraction of Grammatical Relations

I will present a novel approach for applying the Inside-Outside Algorithm to a packed parse forest produced by a unification-based parser. The approach allows a node in the forest to be assigned multiple inside and outside probabilities, enabling a set of `weighted GRs' to be computed directly from the forest. The approach improves on previous work which either loses efficiency by unpacking the parse forest before extracting weighted GRs, or places extra constraints on which nodes can be packed, leading to less compact forests. Experiments demonstrate significant increases in parser accuracy and throughput for weighted GR output.

Yuji Matsumoto (Nara Institute of Science and Technology) 19 July 2005
Machine Learning-based Language Analyzers and Corpus Management Tools

Corpus-based approaches to natural language processing systems have now attained very good performance for basic NL analysis. Producing highly accurage NL analyzers requires robust and effective machine learning models, selection of useful features and accurately annotated training data. In this talk, I will first introduce the use of Support Vector Machines and the models for POS tagging, base phrase chunking and word dependency parsing. I will then talk about feature selection especially for speeding up the process. Finally, I will introduce our recent development of tools for managing annotated corpus and the dictionary that provide flexible search and error-correction of annotated corpora.

David Martinez (Basque Country University) 8 July 2005
All-words WSD: Knowledge Acquisition Bottleneck and Effect of Domain

The last edition of the Senseval evaluation track (Senseval-3, 2004) for WSD showed that all-words systems were still far from the performance that was achieved by lexical-sample systems. The main problems that affect the scalability of WSD algorithms to all words are the lack of training data, and the domain dependency of the systems (methods trained on a corpus usually perform worse when tested on a different one).

In this talk I will describe an approach to obtain training examples automatically, based on (Leacock, 1998), and its application to the all-words disambiguation task. I will also discuss the importance of the domain of the target corpus, and briefly introduce different lines that are being explored in order to address this problem by relying on unlabeled data.

Diana McCarthy (Sussex) 23 June 2005
Relating WordNet Senses for Word Sense Disambiguation

The granularity of word senses in current general purpose sense inventories is often too fine-grained, with narrow sense distinctions that are irrelevant for many NLP applications. This has particularly been a problem with WordNet which is widely used for word sense disambiguation (WSD). There have been several attempts to group WordNet senses given a number of different information sources. We propose a new method to relate word senses using distributionally similar words and apply this to nouns in the Senseval-2 English lexical sample. Our automatic groupings and ranked lists make the WSD task easier and thus improve the accuracy of using first sense heuristics. Furthermore, we demonstrate that accuracy of these WSD heuristics is higher with our automatic methods of relating senses compared to groupings produced manually for Senseval-2. We advocate the use of ranked lists to permit a softer notion of relationships between senses compared to hard groupings, and so that granularity can be varied according to the needs of the application. We have experimented with nouns, although we hope to extend our method to other parts-of-speech in the future.

Various (Sussex) 20 June 2005
ACL Conference/Workshop Practice Talks

The Distributional Similarity of Sub-Parses
Julie Weeds, David Weir and Bill Keller

In this work we explore computing distributional similarity between sub-parses, i.e., fragments of a parse tree. In the same way that lexical distributional similarity is used to estimate lexical semantic similarity, we propose using distributional similarity between sub-parses to estimate the semantic similarity of phrases. Such a technique will allow us to identify paraphrases where the component words are not semantically similar. We demonstrate the potential of the method by applying it to a small number of examples and showing that the paraphrases are more similar than the non-paraphrases.

Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification
Jonathon Read

Sentiment Classification seeks to identify a piece of text according to its authors general feeling toward their subject, be it positive or negative. Traditional machine learning techniques have been applied to this problem with reasonable success, but they have been shown to work well only when there is a good match between the training and test data with respect to topic. This presentation demonstrates that match with respect to domain and time is also important, and presents preliminary experiments with training data labeled with emoticons, which has the potential of being independent of domain, topic and time.

Empirically-based Control of Natural Language Generation
Daniel S. Paiva and Roger Evans

We present a new approach to controlling the behaviour of a natural language generation system by correlating internal decisions taken during free generation of a wide range of texts with the surface stylistic characteristics of the resulting outputs, and using the correlation to control the generator. This contrasts with the generate-and-test architecture adopted by most previous empirically-based generation approaches, offering a more efficient, generic and holistic method of generator control. We illustrate the approach by describing a system in which stylistic variation (in the sense of Biber (1988)) can be effectively controlled during the generation of short medical information texts.

Jonathon Read (Sussex) 26 May 2005
The Co-occurrence Retrieval Framework applied to Text Classification

Julie Weeds and David Weir have described a co-occurrence retrieval framework for measuring the distributional similarity between pairs of words. This framework draws on the notions of precision and recall from the field of document retrieval.

The framework is centred around functions on pairs of feature vectors, which output a score measuring the degree of similarity between the two vectors. It therefore seems plausible that the framework could be applied to other problems that can be described using feature vectors. This presentation will demonstrate the framework's application to the problem of text classification, and discuss the merits and drawbacks of applying the framework to this task.

Ido Dagan (Bar Ilan University, Israel) 14 April 2005
Unsupervised Learning of Textual Entailment Relations

We suggest that recognizing semantic entailment between texts may become a generic NLP task, which generalizes somewhat scattered efforts to cope with semantic variability in various application areas. Towards this goal, we present the result of ongoing research on unsupervised learning of basic entailment relations from corpora and the Web. Two different approaches will be presented, for learning the lexical entailment relationship between words, such as 'company-firm' and 'company-bank', and between more complex lexical-syntactic templates, such as 'X prevent Y -- X lowers the risk of Y'. We will focus on understanding the different types of evidence that indicate the entailment relationship, including particular configurations of distributional features as well as fact-specific feature combinations, which we term anchor sets. Time permitting, we may discuss aspects of the definition of lexical entailment.

David Weir (Sussex) 17 March 2005
Staying in Control of your Pervasive Computing Environment

As the trend towards ubiquitous computing technology gathers pace, and the potential benefits of the technology begin to emerge, there is a growing need to make configuration of pervasive environments accessible to non-technical users. If we are all to maintain a sense of being in control of the technology around us, and are to find pervasive computing enhancing, we need to be able to tailor the behaviour of our environments in a straightforward way in order that it suits our particular personal needs.

In this talk I will report on an interdisciplinary research project that brings together researchers in the areas of distributed systems and natural language processing with the aim of exploring the extent to which an approach centred around the use of natural language processing technology can support non-technical users in the task of configuring their pervasive environments.

Alex Clark (Royal Holloway, London) 17 February 2005
Learning Some Deterministic Context Free Languages: Theory and Practice

I will discuss some algorithms for learning an interesting subclass of context free grammars -- the Non-terminally separated or NTS languages which have deep connections to the theory of reduction systems. These algorithms combine a number of heuristics that have been proposed over the years, including substitutability, mutual information and frequency. The combination of these provides a powerful and efficient algorithm that I used to win the Omphalos competition, a CFG learning competition held in 2004. These approaches can also be used to learn grammars that violate the NTS property slightly, but with much higher computational complexity.

These results are based on synthetic data, but I shall argue that NTS grammars while clearly too weak to serve directly as models of natural language nonetheless do capture an important property of natural language.

I shall also discuss an important theoretical point: the question of what the appropriate formal definition of learnability for the problem of First Language Acquisition is. I shall propose that a modified form of Valiant's PAC-learnability is the correct analysis. Under this criterion, all deterministic regular languages are learnable and I conjecture that NTS languages are also learnable. This (putative) learnability, together with a linear parsing algorithm suggests that NTS grammars may be a useful starting point for research, particularly since related string rewriting systems have a generative capacity that goes beyond the context free languages.

Stephan Oepen (Oslo and Stanford) 30 November 2004
Grammar-Based Generation for High-Quality MT

In the Norwegian research initiative LOGON, researchers from Oslo, Bergen, and Trondheim Universities are building an MT system to translate tourism information from Norwegian into English. The consortium capitalizes on high-quality translation and, thus, has adopted a hybrid approach, combining a symbolic, linguistic backbone, semantic transfer, and stochastic processes for ambiguity management.

English realization in LOGON builds on the HPSG English Resource Grammar (ERG) and chart generation from underspecified MRS meaning representations. Research challenges in this part of the MT system include generator efficiency, underspecification of generator input, and selecting among competing (`semantically equivalent') generator outputs. In this talk, I will review recent progress in all three areas, but focus on the realization ranking task (i.e. the process that assigns propabilities to multiple outputs). I will present experimental results for two alternate ranking schemes, one surface- oriented, the other a conditional model taking into account aspects of syntactic structure; as work in progress, i will discuss the use of the structural model for `on-line' ranking, i.e. using it inside the generator to avoid exhaustive enumeration of all candidates.

Henry Li (Sussex) 25 November 2004
Text Planning for Intelligent Gaming Agents

In the state-of-the-art computer war games, the player is often playing along with some intelligent gaming agents which can use natural language to inform or remind the player about the situation of the player. Currently this kind of NLG is realised by predefined scenarios and canned text. It would be, of course, a nice idea if NLG with automatically acquired knowledge can be applied. In this talk I will introduce and summarise my internship project on this topic at Microsoft Research Laboratory. The talk will focus on the application of reinforcement learning to the most difficult part of this NLG process, viz. content planning. It will be shown that reinforcement learning is both effective and efficient in learning what to talk for simple games, but it encounters much more difficulties when the game environment becomes more complicated.

James Henderson (Edinburgh) 18 November 2004
Neural Network Probability Estimation for Broad Coverage Parsing

Broad coverage parsing is a challenging problem for statistical and machine learning approaches because there can be arbitrarily many phrase structure trees to choose between. History-based statistical parsing models address this problem by representing the choice between phrase structure trees as a sequence of choices between simple features of the trees. But, because this sequence of decisions can be arbitrarily long, there can be an arbitrarily large number of previous decisions to take into consideration when making the next decision.

In this talk, I will present a method for using neural networks to automatically learn fixed-length representations of the arbitrarily long history of parser decisions. The neural network induces these representations while learning to estimate the probability of the next parser decision conditioned on the parse history. A statistical parser then uses these estimates to search for the most probable complete parse. The resulting parser achieves state-of-the-art performance on the benchmark dataset (90.1% F-measure on constituents in the Penn Treebank WSJ dataset, currently second best in the world).

The main reason for this success is the use of soft biases instead of hard independence assumptions in learning a representation of the parse history. Crucially, the neural network architecture we use allows these soft biases to be defined in terms of locality in the phrase structure tree, not just locality in the parse history. In addition, the best performing neural network is trained with a novel combination of a generative probability model and a discriminative optimization criteria. These results demonstrate for the first time how neural networks can be effectively exploited in tasks where structured representations are central.

Jonathon Read (Sussex) 11 November 2004
Recognising Affect in Text using Pointwise Mutual Information

Recognising affect in text is essentially a text classification endeavour. Techniques for text classification have been applied, with varying degrees of success, to classifying documents in terms of author, genre, sentiment and topic (among others).

However, when considering affect recognition, it should be clear that the boundaries between the classes are far less distinct than in other text classification problems; all users of language understand that not only can an emotion be represented in several ways, but also that one representation can be interpreted as several kinds of emotion states. This makes consideration of many personal perspectives essential for classifying text by emotion type.

This talk presents experiments conducted to evaluate a semi-supervised algorithm that classifies text by affect type. The algorithm attempts to calculate an affective orientation using distributional information from a very large corpus, in an attempt to derive a multi-person perspective of emotion-bearing words.

Nick Jakobi (Corpora plc) 4 November 2004
Corpora Software -- 'What do you need to know?'

Corpora Software specializes in semantically-driven solutions to the massive information overload problems faced by businesses today. These include automated news analysis, opinion mining and information extraction. This talk will overview current and future research directions within Corpora and the challenges we face in taking cutting edge technologies to market.

James Curran (Sydney) 12 August 2004
From Distributional to Semantic Similarity

Systems that extract synonyms often identify similar words using the `distributional hypothesis' that similar words appear in similar contexts. This approach involves using corpora to examine the contexts each word appears in and then calculating the similarity between context distributions. Different definitions of context can be used, and I will present results examining how different types of extracted context influence similarity.

Distributional similarity is at best an approximation to semantic similarity. I will present improved approximations motivated by the intuition that some events in the context distribution are more indicative of meaning than others. For instance, the object-of-verb context `wear' is far more indicative of a clothing noun than the context 'get'.

However, existing distributional techniques do not effectively utilise this information. The new context-weighted similarity metric I will describe significantly outperforms every distributional similarity metric described in the literature.

I will also talk about approximating the nearest-neighbour similarity algorithm to increase efficiency and some very large experiments using 2 billion words of raw text. Finally, I will present an application of semantic similarity: unsupervised discovery of supersenses (the WordNet lexfile semantic classification) and show how my approach outperforms the existing supervised approach.

James McCracken (Oxford University Press) 10 June 2004
Rebuilding the Oxford Dictionary of English as a Semantic Network

I will talk about the work I have been involved in to develop WordNet-like functionality within the context of an existing (human-user) dictionary, the Oxford Dictionary of English.

Bill Teahan (Bangor) 3 June 2004
Text Mining using Compression-based Language Models

Bill Teahan has a long standing interest in the area of text compression and information theory. His research over the last few years has concentrated on applying compression-based language models to various problems in natural language processing such as information retrieval and information extraction.

In this talk he will give an introduction to text compression-based language models, and go on to briefly describe some of their applications.

James Dowdall (Sussex) 20 May 2004
Everything You Ever Wanted to Know About Terminology but Were Afraid to Ask!

Technical domains represent specialist knowledge gained through training or experience. Such specialisation uses the foundation of general knowledge to build a level of expertise within a given domain necessitating an expansion in vocabulary to include domain specific objects and concepts which the inexpert perceive as 'jargon'. BioMedical research and computer science are good examples of technical fields that require knowledge specialisation. A tangible result of which is the terminology used to bridge the gap between 'general knowledge' and 'expert knowledge', the 'everyday' and the 'specialised'.

This makes terminology a high value information source for many Natural Language Processing (NLP) applications that operate over technical domains. Unfortunately, without dedicated analysis terminology becomes, at best, a computational burden and, at worse, an effective barrier against accessing the knowledge of the domain.

This talk explores these two problems within the context of a Question Answering system.

Paola Merlo (Geneva) 6 May 2004
Issues in Multilingual Automatic Verb Classification

A major problem in the processing of language, by human or machine, lies in the proper treatment of new or unanticipated words. How can a system know all the necessary information to deal with them? Of particular concern are verbs, crucial elements for interpretation. Our approach to this problem is to automatically group verbs in classes, which will provide the necessary predictive information for new verbs.

We have developed a novel methodology, based on a corpus-based statistical 'summary' of the thematic properties of a verb, which forms the input to an automatic classification system. The methodology has been so far successfully applied to some difficult classification problems for English verbs using supervised learning (Merlo-Stevenson 2001). Recently, we have extended this basic methodology to a multi-lingual setting to expand both the applicability and the performance of the approach. Our multi-lingual research incorporates two inter-related threads. In one, we exploit the similarities in the cross-linguistic classification of verbs, to extend work on English verb classification to new languages (Italian), and to new classes within that language (Merlo et al 2002). This extension shows that the method is portable across languages and across different kinds of verb types and meanings. Our second strand of research exploits the differences across languages in the surface expression of meaning, to show that complementary information about English verbs can be extracted from their translations in a second language (Chinese or Italian), improving the classification performance of the English verbs (Tsang et al. 2002). All together, this work demonstrates the benefits of a multi-lingual approach to automatic lexical semantic verb classification based on statistical analysis of corpora in multiple languages.

Toshihiro Wakita (Toyota Central R&D Labs. Inc., Japan) 19 March 2004
Research in Speech and Language Processing at Toyota

The presentation will cover the topics:

  1. current product of in-vehicle information equipment;
  2. our research activity (speech recognition, dialogue system, adaptive interface); and
  3. expectation to NLP research field.

Dr. Wakita is the leader of the research team at Toyota working on speech and language processing.

Kentaro Inui (Nara Institute of Science and Technology) 26 February 2004
Paraphrases: Generation for Text Simplification and Recognition for Question Answering

Paraphrases are alternative ways to convey the same information. Technology for generating and recognizing paraphrases can benefit a broad range of NLP tasks. This talk presents a couple of our ongoing research projects related to paraphrasing: paraphrase generation for reading assistance, and paraphrase recognition for question answering.

In the first half of the talk, I will present our ongoing research project on text simplification for congenitally deaf people. Text simplification we are aiming at is the task of offering a deaf reader a syntactic and lexical paraphrase of a given text for assisting her/him to understand what it means. We discuss the issues we should address to realize text simplification and report on the present results in three different aspects of this task: readability assessment, organization of paraphrasing resources, and post-transfer error detection.

In the second half, I will present the results of our analysis of how paraphrase and coreference affect the performance of question answering. For each sample question and its corresponding sentence which contains correct answer, we made a chain of paraphrases so that they are exactly the same. Based on a taxonomy of the collected paraphrases, we propose a new class of NLP subtask, which can be seen as a way of generalizing the TREC-style information extraction task.

Takaki Makino (Institute of Industrial Science, University of Tokyo) 19 February 2004
In Addition To Syntax: Theoretical Exploration of Neural Mechanism for Language

Human linguistic abilities cannot be explained only by syntax. Although syntax corresponds to a conversion process from brain-internal representation to linguistic expressions (and vice versa), we also need to study the brain mechanism that supplies/processes the internal representation. In other words, syntax stands for 'how we talk'; we need to reveal 'what we talk' (contents = meaning) and 'why we talk' (motivation to communicate) too.

In this talk, I discuss neural codings of meaning in the brain. To represent meaning of a sentence, the coding has to satisfy several properties. When we choose temporal coding to satisfy compositionality, we run into another problem, that is, to provide a neural implementation that converts a sentence (temporal sequence of words) to meaning (temporal coding). After theoretical discussion of the required mechanism in the conversion, I introduce one possible implementation of the conversion process.

I also discuss the motivation of communication, based on the self-observation principle (Humphrey, 1978; Wolpert et al., 2003; Makino et al., 2003). To communicate, one needs to be able to estimate another's internal state, and vice versa. After some theoretical discussion on the difficulties in this mutual estimation, I describe why the self-observation principle solves these difficulties.

Olivier Jouve, Tom Khabaza and Rick Adderley (SPSS) 12 February 2004
Text Mining in Predictive Analytics: the Technology and its Applications

This extended seminar will introduce the field of predictive analytics and describe SPSSs text mining technology in this context.

The seminar will be given in four sections:

  1. Introduction to data mining, the Clementine workbench and how text mining fits into this framework;
  2. Overview of Text Mining Applications;
  3. Detailed description of SPSSs text mining technology;
  4. Overview of Text Mining for Clementine plus Demonstration.
Xinglong Wang (Sussex) 29 January 2004
Automatic Acquisition of English Topic Signatures Based on a Second Language

The knowledge acquisition bottleneck is a major problem for supervised machine learning algorithms. Manually produced resources are costly and never seem to satisfy the huge requirement by machine learning systems. In this talk, I will present an approach that attempts to acquire English topic signatures automatically, using very large text collections in a second language (Chinese is used at this moment), either retrieved from the Web, or from available text corpora, and bilingual dictionaries. Topic signatures could be useful for a wide range of applications, such as WSD and text classification. We did an experiment on WSD, where we trained the so-called 'context-group discrimination' algorithm on our topic signatures, and tested the sytem on hand-tagged copora, with promising results.

Eva Esteve Ferrer (Sussex) 15 January 2004
Automatic Classification of Spanish Verbs based on Subcategorization Information

I will present the work I have done for my Masters thesis at the University of Geneva, which consists in a series of experiments aiming at automatically classifying Spanish verbs into semantic classes based on subcategorization acquisition. This approach is based on the hypothesis that the semantic properties of verbs are closely related to their syntactic behaviour.

The experiments involve two different well-known problems in Natural Language, that I approach from an empirical perspective. First, the acquisition of verb subcategorization frames from corpora. Second, the classification of verbs into semantic classes. For the former, I apply well-known techniques that have been developed for the English language to Spanish. This task demands a deep understanding of the methodology used, and also adapting it to the new problems encountered due to the linguistic properties of Spanish. For the latter, I use an unsupervised clustering technique and I base my target classification on an existing theoretical work on Spanish verb classification. In this way, my experiments bring to an empirical context a linguistic study on the properties of Spanish verbs.

I will also present and discuss, as work in progress, possible improvements to the mentioned experiments.

John Campbell (UCL) 27 November 2003
Acquisition of Ontologies in Multi-Agent Computing

Ontology is a popular word in artificial intelligence, and in the study of autonomous computational agents and the design of multi-agent systems (MAS) in particular. In practice an element of a MAS ontology is a term plus a set of predicates whose satisfaction is a necessary and sufficient (more or less) condition for the term to be relevant, active etc.\ in the situation that the MAS is processing.

Defining an ontology is commonly considered to be an important part of the software-engineering process of designing a MAS. But the increased demand for interoperability of different MASs designed without reference to each other means that autonomous agents will need to discover ontological information when they interact with or migrate to unfamiliar systems. The seminar will cover various ways of making this process of discovery effective. The emphasis will be on ideas from areas as diverse as social anthropology, administration of monasteries and promotion of tourism. Examples will be given at least from agents learning about cricket and planning of frequency allocation in shortwave broadcasting.

The seminar is intended at least partly as a forum for the exchange of ideas: it may be that ontological issues in natural-language processing have something new to teach MAS builders and designers.

Julie Weeds (Sussex) 20 November 2003
A General Framework for Distributional Similarity

We present a general framework for distributional similarity based on the concept of substitutability. We cast the problem as one of co-occurrence retrieval for which we can measure precision and recall by analogy to the way they are measured for document retrieval. We define six co-occurrence retrieval models which can be used to investigate a number of issues in distributional similarity including asymmetry and the relative importance of different types of features. We note that different parameter settings within our framework approximate different existing similarity measures as well as many more which have, until now, been unexplored. We show that high recall measures tend to be better at a semantic task (based on WordNet) and high precision measures tend to be better at a distributional task (pseudo-disambiguation). We then discuss the implications of this on the use of semantic similarity measures in language modelling and on the use of distributional similarity measures in automatic thesaurus generation.

Rob Koeling (Sussex) 13 November 2003
Named Entity Recognition

In this talk I will present work on a Named Entity Recognizer (NER) built for the Meaning project. Named entity recognition is a well researched topic. The last two CONLL shared tasks were dedicated to it and performance of most recognizers is quite good. The one that I discuss here is a fairly standard, good performing NER, but in order to be a champion, more work needs to be done. The core of this NER is a Maximum Entropy model, but many other machine learning paradigms could be adopted instead.

I will give an overview of other approaches, focussing on what sources of information have the biggest impact on the performance. After presenting the current model and discussing which features matter for this model, I evaluate the results (using the CONLL data sets). The weaknesses of the currrent model are identified and several ways to improve the currrent model are discussed. Finally, some thoughts on integration with the RASP parser are discussed.

Francis Real (UPC Barcelona) 6 November 2003
Acquisition of Semantic Patterns Using the Relax Algorithm for Disambiguation

In information extraction, if we find sentences like 'The cat eats fish' and get a parse containing:
EAT:v - subject - CAT:n
EAT:v - object - FISH:n
we can say that it is possible to use the verb EAT with the object FISH.

But sometimes this is not enough. In Wordnet 1.6 the verb EAT has 6 senses, the word CAT has 7 senses and the word FISH has 4 senses. It would be better if we could say:
eat\#v\#3 - subject - cat\#n\#1
eat\#v\#3 - object - fish\#n\#1

This kind of information would be more useful in applications, but it is difficult to obtain.

There has been a lot of work on this topic; we present a new approach using relaxation techniques. With the RELAX algorithm we can combine different information sources like Wordnet relations, Wordnet Domains, Semcor examples, SUMO ontology, dictionary definitions, and so on to choose which is the right sense of the words in a syntactic position.

In this talk I will explain how the RELAX algorithm works, how we can adapt the algorithm for this task, various problems, and present some initial results (for monosemous verbs with a small set of heuristics based on Wordnet hyponym distance, Semantic Files and Wordnet Domains) and directions for future work.

Naoki Yoshinaga (Tokyo) 23 October 2003
Parsing Comparison Between LTAG and HPSG Using Strongly Equivalent Grammars

This talk presents an approach to empirical comparison between parsers for LTAG and HPSG. The key idea of our approach is to use strongly equivalent grammars, which generate equivalent parse results for the same input, obtained by grammar conversion. We make use of a grammar conversion from LTAG to HPSG-style in order to obtain strongly equivalent grammars. Experimental results using two pairs of LTAG and HPSG parsers with dynamic programming and CFG filtering revealed that the different ways of adapting the parsing techniques cause the performance difference, as well as revealed a definite way of improving these parsing techniques.

Stephen Clark (Edinburgh) 16 October 2003
Parsing the WSJ using CCG and Log-linear Models

In this talk I will describe log-linear parsing models for Combinatory Categorial Grammar (CCG) and present results on the CCG version of the WSJ Penn Treebank.

Log-linear models have previously been applied to statistical parsing, under the assumption that all possible parses for a sentence can be enumerated. Enumerating all parses is infeasible for large, automatically extracted grammars; however, dynamic programming over a packed chart can be used to efficiently estimate the model parameters. I will describe a parallelised implementation which runs on a Beowulf cluster and allows the complete WSJ Penn Treebank to be used for estimation.

A unique feature of CCG is the existence of non-standard, 'spurious' derivations all leading to the same derived structure. I will describe parsing algorithms and metrics which can account for all derivations and efficiently return the best scoring derived structure. These will be compared with the standard Viterbi algorithm applied to 'normal-form' derivations.

[Joint work with James Curran]

Mary McGee-Wood (Manchester) 25 September 2003
'Mixed Initiative' in Cancer Care Dialogues

The concept of 'initiative', or 'control over the flow of conversation', is well established in the analysis of human-human dialogue and the design of human-machine dialogue. It can be superficially mapped to the distinction between 'open' and 'closed' questions developed by psychiatrists in the analysis of human-human cancer care dialogues. However, the linguistic patterns and mechanisms of initiative found in these dialogues are more complex than in those previously studied in the NLP community. The mappings from utterance type to discourse function are subtle and context-dependent.

In this talk I will discuss -- somewhat anecdotally -- the negotiation of initiative in our corpus of cancer care dialogues, focussing on questions and the sequences around them. There are cautionary implications for the design of human-machine dialogue systems.

David Milward (Linguamatics Ltd., Cambridge) 5 June 2003
Ontology-based Dialogue Systems

Commercial spoken dialogue systems are mostly based on menu or form filling paradigms. However, one of the advantages of speech should be to allow users to take control of an interaction e.g.\ to skip to a leaf node of a menu structure, or to ask a clarification question.

This talk will discuss replacing hand crafted dialogue descriptions with task and domain descriptions. In particular it will discuss the role of a domain ontology, both in allowing more flexible dialogue interactions, and to enable interpretation of what people say, possibly in conflict with the grammatical constructions which they have used.

Two case studies will be presented, one for spoken dialogue based cancer referrals (joint work with Martin Beveridge of Cancer Research UK), the other for multi-modal control of networked home appliances.

Mark McLauchlan (Sussex) 29 May 2003
Building a Parsing Model Using Semi-supervised Data

Statistical parsers generally make extensive use of lexical information for disambiguation, and words are crucial for resolving certain kinds of syntactic ambiguity such as prepositional phrases. The RASP parser developed by Briscoe and Carroll is fairly unique in this regard, since it relies mostly on structural parse features. This gives it excellent coverage, but its accuracy on certain structures could be improved. In this talk I will describe my solution: a lexicalised parsing model that reranks the output of this non-lexicalised parser. The underlying model uses fairly standard maximum entropy techniques, but the training data is based on the output from the RASP parser itself. This semi-supervised approach gives us access to much more data from arbitrary domains, but with some loss in accuracy. I will describe my approach which overcomes this problem by training the model on less ambiguous subcategorisation frames extracted automatically from the underlying data. The resulting parsing model has some nice similarities to a traditional lexical knowledge base. I will unveil a (slight) improvement in accuracy and discuss some possible applications.

Lynne Murphy (Sussex) 22 May 2003
Synonymy, Lexical Representation and Psycholinguistic Reality

This talk will review facts about lexical synonymy from a pragmatic and psycholinguistically-informed theoretical perspective. According to this view, synonymy is not represented directly in the mental lexicon, but is derived at a metalinguistic level via a principle of Lexical Contrast. This is at odds with many computational approaches to synonymy today, and since computational linguists currently spend more time and effort on problems concerning synonyms than theoretical linguists or psycholinguists, it raises the question: what can these disciplines teach each other (if anything)? Some answers to this question will be reached by comparing the respective goals and assumptions of these approaches.

Alex Clark (Geneva) 15 May 2003
Pre-processing Very Noisy Text

Existing techniques for tokenisation and sentence boundary identification are extremely accurate when the data is perfectly clean (Mikheev, 2002), and have been applied successfully to corpora of news feeds and other post-edited corpora. Informal written texts are readily available, and with the growth of other informal text modalities (IRC, ICQ, SMS etc.) are becoming an interesting alternative, perhaps better suited as a source for lexical resources and language models for studies of dialogue and spontaneous speech. However, the high degree of spelling errors and irregularities and idiosyncracies in punctuation, the use of white space and the use of capitalisation require the use of specialised tools. In this talk, we study the design and implementation of a tool for preprocessing and normalisation of noisy corpora. We argue that rather than having separate tools for tokenisation, sentence segmentation and spelling correction organised in a pipeline, a unified tool is appropriate because of certain specific sorts of errors. We describe how a noisy channel model can be used at the character level to perform this. We describe how the sequence of tokens needs to be divided into various types depending on their characteristics, and also how the modelling of white-space needs to be conditioned on the type of the preceding and following tokens. We use trainable stochastic transducers to model typographical errors. and a variety of sequence models for white space and the different types of tokens. We discuss the training of the models and various efficiency issues related to the decoding algorithm, and illustrate this with examples from a 100 million word corpus of Usenet news.

Ewan Klein (Edinburgh) 20 March 2003
Conversations with a Mobile Robot

I will describe joint work (with Johan Bos and Tetsushi Oka) on integrating a spoken dialogue system with a mobile robot. The current implementation allows the user to direct the robot to specific locations, ask for information about the robot's status, and supply information about its environment. The dialogue system is implemented as a collection of agents within SRI's Open Agent Architecture, within which the dialogue manager agent acts as the overall coordinator. The language model used by the Nuance speech recognizer is compiled from a general unification grammar and recognized strings are paired with a DRS. In order to support inference, the DRS representing the dialogue is combined with a DRS containing information about the physical context (i.e., the position of the robot and currently accessible locations) together with background axioms, and translated into first-order logic; after this, inference tasks are sent to both a theorem prover and a model builder. The robot uses an internal map for navigation. The current implementation has a hand-built geometric map which is then converted into a topological map. The latter, supplemented by semantic labels, is used for communicating the robot's orientation and accessible locations to the dialogue system.

Rob Gaizauskas (Sheffield) 13 March 2003
Question Answering: Where Do We Go from Here?

Open domain Question answering (QA) has recently become a hot topic in applied natural language research. Driven by the appearance of Web search engines such as 'Ask Jeeves', apparently offering capabilities beyond those of conventional search engines, and by the appearance of a QA track in the Text REtrieval Conference (TREC) evaluations, QA has moved to center stage in a remarkably short time. Aside from the undoubted practical benefits working QA would bring, QA is also of interest to researchers as it forms the current front line in the contest between those convinced that shallow word-based approaches to text processing/NLP are sufficient for applied systems (the bag-of-words brigade) and those who remain convinced that deeper approaches based on descriptive linguistic knowledge will triumph in the end (the linguistic cavaliers).

In this talk I will discuss the methodology and overall results of the TREC QA track evaluations, to date, and attempt a general characterisation of approaches entrants have used. I will describe in more detail the approaches we have used in Sheffield in our TREC systems and end by presenting our ideas and inviting discussion about where we go from here.

Katya Markert (Edinburgh) 20 February 2003
MASCARA: A Machine Learning Approach to Metonymy Understanding

Metonymy is a type of figurative language where one entity is used to refer to another related entity. A typical example is:

(1) Does he drive a BMW?

In (1), the name of a company 'BMW' is used to refer to a car produced by that company.

Most traditional approaches to metonymy understanding rely on large amounts of world knowledge and are not easily scalable and/or portable. In the MASCARA project we explored supervised statistical classification methods as an alternative approach. We will present the following results:

  1. The regularity and productivity of metonymic readings make it possible to reduce metonymy understanding to a classification task comparable to classic word sense disambiguation for the large majority of metonymic readings.
  2. Grammatical argument-head relations are an appropriate feature for such a classification task and, in particular, more predictive for metonymy recognition than e.g., collocations.
  3. Data sparseness in argument-head relations can be at least partially overcome by the integration of thesaurus information without increasing the size of the training data.
  4. We also explore the effect that parser accuracy has on the recognition algorithm and compare the difficulty of metonymy recognition to classic word sense disambiguation.
Denis Johnston (formerly of BT) 30 January 2003
Speech and NLP: Back to the Future

For as long as I can remember computers that could talk, listen and respond in natural language have been just around the corner. Even as recently as 1990 when Bill Gates famously observed "The keyboard will soon be seen as a curiosity of the 20th Century" confidence was high. Until recently it was taken as inevitable that with just a little bit more research, just a little bit more computational power and with just a slightly greater number of people dedicated to the task, the remaining problems would be solved. Before we knew it, computers and humans would be communicating in the most natural way possible.

But despite an abundance of computing power, marginal implementation costs, and many of the technological barriers being overcome, it just hasn't turned out that way. Large vocabulary speaker independent speech recognition over telephone networks is now feasible - but there are hardly any applications. High quality text to speech has also become available - but it is generally avoided. Advanced natural language query systems are ignored in favour of relatively simplistic search engines and the expected demand for language translation systems has collapsed as simple multi-lingual forms based systems have proved to be a simple, cheap and accurate solution for almost every commercial application.

It may, of course, be argued that this is a jaundiced view and that Speech and Language has actually been an enormous success. Low bit rate speech coding (used in mobile phones) and text processing techniques (in desktops) provide spectacular instances where Speech and Language technologies have not only changed the world but have provided the very foundations for the basic induatries of the 21st Century. But most of us we don't really believe that these are genuine Speech and Language Technologies as they do not attempt to exploit any of the meaning in the processed signals.

Which almost makes it worse. For the successes of speech coding and simple text processing have been so great, that in comparison, the impact of the real "Speech and Language Technologies" in the Information Revolution has been conspicuously negligible.

Paradoxically the success of speech coding and text processing has meant that industrial research activity in Speech and Language is now significantly less than it was. With maturity has come standardisation and commoditisation of the basic components and the investment has moved on towards system integration and applications.

There has been a corresponding "standardisation and commoditisation" of some speech and language technologies but the markets have so far proven to be very small. Many of the big players have either failed commercially or been bought out only to have their staff put onto other technologies. Those that still exist are in niche markets or are servicing legacy systems.

So it is in this environment that I want to explore what the future might hold for Speech and Language. Is the current market weakness simply a glitch or has Speech and Language Technology had its day? Has the world wide web and the ubiquitous keyboard/screen/Windows combination rendered speech and language technologies obsolete or has it actually opened up a whole new set of possibilities?. Or could it be that the next revolution in technology, the speech and language revolution, is still just around corner?

Nicola Yuill (Sussex) 23 January 2003
Riddles and Reading: Using Language Ambiguity in Riddles to Investigate Children's Problems in Text Comprehension

Some children of around 7-9 years of age are good at decoding text but poor at understanding it. Studies of these poor comprehenders suggests that 3 factors contribute to their problems: poor working memory, inability to make appropriate inferences, and poor language awareness (the ability to reflect on language structure as distinct from content). The last of these factors is somewhat ill-defined but one way it has been operationalised is to look at the understanding of language ambiguity in joking riddles. I describe two studies looking at riddle appreciation in such children as a way of investigating the ability to interpret different aspects of language ambiguity, using a categorisation of riddle types. I then discuss some studies using riddles to promote comprehension and present data on how children's discussions about language during training relate to later improvements in their reading comprehension.

Gerald Gazdar (Sussex) 12 December 2002
XSDL for NLP (walking in the footprints of Dr.T.)

To what extent can the XML Schema Definition Language (XSDL) be used to write grammars directly for natural languages and are there any reasons to think this might be a sensible thing to do? Topics addressed in response to this question will include: phrase structure rules and tree fragments, attribute-value matrices, the XDSL type system and hierarchy, macros, substitution groups, path constraints, the limitations of XML parsers, and potential uses for XSDL NL grammars. Some toy fragments of English syntax may make a fleeting appearance on the overhead projector. The talk will presuppose some familiarity with the unification grammar tradition in NLP but will not presuppose any prior exposure to XSDL.

Adam Kilgarriff (Brighton) 5 December 2002
Web as Corpus: Introduction and Survey

Language scientists and technologists are increasingly turning to the web as a source of language data, because other resources are not large enough, because they do not contain the types of language the researcher is interested in, or simply because it is free and instantly available. In this talk. I shall survey work done in this mode, by linguists, language technologists and translators, and shall sketch proposals for a 'Linguistic Search Engine'.

Maria Lapata (Edinburgh) 5 December 2002
Using the Web to Overcome Data Sparseness

In this talk we present a series of experiments that show that the web can provide reliable word co-occurrence frequencies, even in the face of severe data sparseness. We describe a method for retrieving word co-occurrence frequencies from the web by querying a search engine. We evaluate this method by demonstrating that web frequencies (a) correlate with frequencies obtained from a balanced corpus such as the BNC, (b) reliably predict human plausibility judgments, and (c) yield state of the art performance when employed for NLP tasks that typically rely on smoothed probability estimates such as pseudo-disambiguation, the disambiguation of candidate translations, the bracketing of compound nouns, prepositional phrase attachment, and the ordering of prenominal adjectives.

Phil Edmonds (Sharp Labs, Oxford) 21 November 2002
Synonymy, Similarity, and Relatedness

Statistical corpus analysis has been very useful in the study of the lexicon for language technology and lexicography, but how much has it informed research into lexical semantics and the human's ability to use language. Indeed, how much can corpora tell us about human language ability? Or, how much of that ability is encoded in a corpus?

This talk will discuss one aspect of lexical semantics: synonymy and similarity of words. I will show how statistical corpus analysis can be used to acquire synonyms and related words. Then, I will discuss what corpora can and can not yet tell us about human ability by using 1) an experiment in fine-grained lexical choice that compares system output to corpora, and 2) an experiment that compares automatically generated lists of related words to lists elicited from people.

Julie Weeds (Sussex) 14 November 2002
A Comparison of Lin's Similarity Measure and Lee's Alpha-Skew Divergence Measure

Dekang Lin's and Lillian Lee's measures of distributional similarity are both commonly used by NLP researchers wishing to determine a word's nearest neighbours and have both been cited as the best distributional similarity measure currently in existence. In this talk I will first examine the theoretical differences between the two measures and also discuss computation issues. I will then present the results of an empirical evaluation of the two measures using a pseudo-disambiguation task. I will discuss the effects of word frequency on these results and look at biases present in each similarity measure from both a theoretical and empirical viewpoint.

Aline Villavicencio (Cambridge) 6 November 2002
Verb-particle Constructions in a Computational Grammar of English

In this talk I'll discuss verb-particle constructions, talking about their characteristics and the challenges that they present for a computational grammar. The discussion will concentrate on the treatment adopted in the LinGO English Resource Grammar. Given the constantly growing number of verb-particle combinations, possible ways of extending this treatment are investigated. As possible sources of information to extend this treatment, I'll discuss (conventional and electronic) dictionaries and Levin's classes of verbs, and the limitations found in each case. Taking into account the regular patterns found in some productive combinations of verbs and particles, one possible way to try to capture these is by means of lexical rules, and I'll talk about the advantages and disadvantages encountered when adopting such an approach. More specifically, the need of restricting the productivity of lexical rules in order to deal with subregularities and exceptions to the patterns found.

Yuval Krymolowski (Bar-Ilan University, Israel) 13 June 2002
Using Training Samples for Error Analysis

One of the most important tasks in developing a statistical NLP system is to analyse its performance. This includes discovering typical errors, as well as strengths and weaknesses of the system. Systems are typically evaluated using a standard choice of training and test data. Examining false positives and negatives after a single run can, however, be misleading due to sampling noise. A false positive or negative may reflect a real weakness of a system, or a random balance of positive and negative indicators in the particular training data. A common solution to this problem is manual classification of errors, which can be time consuming.

We propose a method for speeding up the analysis of errors. The method involves training a statistical NLP system on multiple training samples. This yields a collection of models. The idea is to regard the models in the collection as describing the target linguistic patterns from different viewpoints. A model resulting from training on a single dataset relies on the existence and balance of features in this dataset. Due to the Zipfian nature of natural language data some features reflect typical phenomena while many others reflect specific, more rare cases. Models trained on different samples are therefore likely to agree on most of the typical structures, and disagree on less typical ones.

In our approach, we represent each instance by a detection profile, i.e. a bit vector denoting which models detected it and which did not. The number of detections in a profile provides an indication of the 'easiness' of the instance. We can characterize instances by the way their easiness varies according to sample size, which we vary from 10% of the training set size, to a complete training set. This provides an insight into the process of learning. We also use the detection profiles as input for clustering. The clusters reflect similarity of instances with respect to the features employed by the NLP system.

By examining clusters of harder instances, we can focus on errors that reflect actual weaknesses of a system rather than sampling noise. We demonstrate our method on the task of detecting noun phrases by a memory-based shallow parser, with encouraging preliminary results. As future work, we propose comparing the resulting clusters and easiness figures obtained when using several NLP systems. This should provide a means for evaluating these systems qualitatively, even when their recall and precision figures are similar.

Julie Weeds (Sussex) 6 June 2002
Smoothing Using Nearest Neighbours

In this talk, I will discuss using a word's nearest neighbours to derive estimates of population probabilities for word co-occurrences that are better than the sample probabilities. In particular, I will look at the way information from different neighbours is combined and the number of different neighbours that is considered. I will then introduce work-in-progress on the separation of word senses.

Michael Zock (Limsi-CNRS, Paris) 24 May 2002
Electronic Dictionaries for Men, Machines or for Both?

Dictionaries are a vital component of any natural language processing system (natural or artificial). In their modern form, the electronic dictionary, they have a tremendous potential, provided that they are built in a way that allows for use not only by experts or machines, but also by ordinary language users. Unfortunately, despite the enormous interest in electronic dictionaries in general and thesaurus-like semantic networks (Wordnet) in particular, little attention has been paid to the language USER. And yet, a lexical database is worthless if the data is not (easily) accessible.

There are many possibilites to make a dictionary useful for people in their daily tasks of processing or learning a language. In many cases it would require relatively little effort to make a lexical knowledge base accessible to the language user. For example, a dictionary fully interfaced with a wordprocessor would allow for active reading. In such an environment, clicking on a word would reveal its translation, its definition, its usage (in the current context), the idioms it controls, grammatical information, its spoken form, etc. For the language producer it would be definitely useful to have a tool assisting him in finding or in generating the needed inflected form. Words should be accessible on the basis of meaning, (i.e. lexically or conceptually related words), linguistic form (sound, spelling) and perhaps even the surrounding context.

It is true that there are tools for some of these types of operations, yet, typically, existing electronic tools allow only one or a few of these options. It would be nice to have a single tool that allows many different kinds of operations, using the same dictionary. The question of integrating the different kinds of lexical search should be the focus for psycholinguistic as well as cognitive ergonomic research.

The goal of this talk is to explore ways of enhancing electronic dictionaries by adding specific functionality (i.e. cross lingual/intra lingual lexical search) and to discuss the problems that have to be solved in order to build them. For example, if electronic dictionaries were built like mental dictionaries (associative networks, akin to Wordnet, but with many more relations), they could assist people in finding new ideas (brainstorming) or the word on the tip of their tongue/pen. Within this framework, word access amounts to entering the network at a node and to following the links from the source node (the first word that comes to your mind) to the target word (the one you are looking for).

Interesting questions that arise in this practical scenario are: What are the links or associations between words? Can we reasonably encode (all or some of) them into a dictionary? Where to look for in order to get a list of associations (Mel'cuk's work)? Should we allow for adding private information (personal associations)? Is it possible to extract this kind of information automatically by parsing an encyclopedia or large amounts of text?

In sum, it seems that builders of electronic dictionaries are sitting on a gold mine that they still largely ignore how to explore and exploit. Yet, there is good reason to believe that there is a market for products integrating more advanced ways of accessing lexical information.

John Carroll (Sussex) 9 May 2002
Proposed Work in the MEANING and DEEP THOUGHT Projects

I will give a short, informal outline of the work that will be carried out in the two EU 5th Framework projects MEANING and DEEP THOUGHT.

In MEANING --- Developing Multilingual Web-scale Language Technologies --- we will acquire lexical information from large corpora and the WWW, in order to augment EuroWordNet with more comprehensive lexical knowledge with the goal of supporting improved word sense disambiguation. DEEP THOUGHT --- Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction --- is concerned with devising methods for combining robust shallow methods for language analysis with deep semantic processing. The approach will be demonstrated in business intelligence, automated email processing and document production support applications.

Stephan Oepen (Stanford University) 1 May 2002
LinGO Redwoods: A Rich and Dynamic Treebank for HPSG

The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. A treebank is a (typically hand-built) collection of natural language utterances and associated linguistic analyses; typical treebanks---as for example the widely recognized Penn Treebank (Marcus, Santorini, amps; Marcinkiewicz, 1993), the Prague Dependency Treebank (Hajic, 1998), or the German TiGer Corpus (Skut, Krenn, Brants, 1997)---assign syntactic phrase structure or tectogrammatical dependency trees over sentences taken from a naturally-occuring source, often newspaper text. Applications of existing treebanks fall into two broad categories: (i) use of an annotated corpus in empirical linguistics as a source of structured language data and distributional patterns and (ii) use of the treebank for the acquisition (e.g. using stochastic or machine learning approaches) and evaluation of parsing systems.

While several medium- to large-scale treebanks exist for English (and some for other major languages), all pre-existing publicly available resources exhibit the following limitations: (i) the depth of linguistic information recorded in these treebanks is comparatively shallow, (ii) the design and format of linguistic representation in the treebank hard-wires a small, predefined range of ways in which information can be extracted from the treebank, and (iii) representations in existing treebanks are static and over the (often year- or decade-long) evolution of a large-scale treebank tend to fall behind theoretical advances in formal linguistics and grammatical representation.

LinGO Redwoods aims at the development of a novel treebanking methodology, (i) \textbf{rich in nature and \textbf{dynamic in both (ii) the ways linguistic data can be retrieved from the treebank in varying granularity and (iii) the constant evolution and regular updating of the treebank itself, synchronized to the development of ideas in syntactic theory. Starting in October 2001, the project is aiming to build the foundations for this new type of treebank, develop a basic set of tools required for treebank construction and maintenance, and construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license. Building a large-scale treebank, disseminating it, and positioning the corpus as a widely-accepted resource is a multi-year effort. The purpose of publication at this early stage is three-fold: (i) to encourage feedback on the Redwoods approach from a broader academic audience, (ii) to facilitate exchange with related work at other sites, and (iii) to invite additional collaborators to contribute to the construction of the Redwoods treebank or start its exploitation as early-access versions become available.

Stephen Clark (Edinburgh) 14 March 2002
Wide-coverage Statistical Parsing using Combinatory Categorial Grammar

In this talk I will describe a wide-coverage statistical parser that uses a Combinatory Categorial Grammar to derive dependency structures. The parser differs from most existing wide-coverage parsers in yielding 'deep' dependency structures, capturing the semantic dependencies inherent in constructions such as coordination, extraction, raising and control. A set of dependency structures used for training and testing the parser is obtained from a treebank of categorial grammar derivations, which have been derived (semi-)automatically from the Penn Treebank. The parser recovers over 80% of labelled dependencies correctly, and around 90% of unlabelled dependencies.

Rudi Lutz (Sussex) 7 March 2002
Learning Hidden Markov Models Using Penalised EM

Discrete Hidden Markov Models are useful in many speech and NLP related applications (e.g. tagging). Traditionally these are trained using the Baum-Welch ('forwards-backwards') algorithm, but this runs the risk of overfitting the data. This talk will describe empirical work Bill Keller and I have done on using penalised EM to train HMMs, using a variety of possible priors (e.g.\ Dirichlet distributions, entropic priors) on the parameter space of the models to bias the learning. We have found that using symbol frequency information (computed from the data) enables us to generalise better, and largely avoid the over-fitting problem.

Gertjan van Noord (Groningen, The Netherlands) 21 February 2002
Disambiguation and Efficiency in Wide-coverage Computational Analysis of Dutch

Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. Alpino is based on a head-driven lexicalized grammar and a large lexical component, which has been derived from existing resources. Alpino produces dependency structures, as proposed in the CGN (Corpus of Spoken Dutch).

Important aspects of wide-coverage parsing are robustness, efficiency and disambiguation. In the talk we briefly introduce the Alpino system, and then discuss two recent developments. The first development is the integration of a log-linear ('Maximum Entropy') model for disambiguation. It is shown that this model performs well on the task, despite the small size of the training data that is used to train the model.

The second development concerns the implementation of an unsupervised POS-tagger. It is shown that a simple POS-tagger can be used to filter the results of lexical analysis of a wide-coverage computational grammar. The reduction of the number of lexical categories not only greatly improves parsing efficiency, but in our experiments also gave rise to a mild increase in parsing accuracy; in contrast to results reported in earlier work on supervised tagging. The novel aspect of our approach is that the POS-tagger does not require any human-annotated data - but rather uses the parser output obtained on a large training set.

Raphael Salkie (Brighton) 7 February 2002
Working with a Translation Corpus

A translation corpus (sometimes called a parallel corpus) is a collection of texts in one language and their translations in another, stored on a computer. Translation corpora are useful for research into translation and contrastive linguistics, and in the last ten years they have aroused considerable interest. At Brighton we have put together a reasonably diverse corpus of French-English and German-English texts. In this talk I will illustrate some of the ways we can use this type of corpus to:

  1. Teach translation.
  2. Revisit some key concepts in translation theory.
  3. Evaluate a cross-linguistic framework for analysing modal verbs.
  4. Investigate discourse markers in English.
  5. Use one language to investigate 'gaps' in the other: inanimate stressed pronouns in French.
  6. (Maybe) find an automatic way of distinguishing 'translationese' from native-speaker language.
Walter Daelemans (Antwerp and Tilburg) 17 January 2002
Limitations of Current Methodology in Machine Learning of Natural Language

We show that a large part of the conclusions of the published literature on Machine Learning of Natural Language may not be reliable, regardless of methodologically sound procedures being used. In the quest for insight into which algorithm has the right 'bias' for learning specific or generic language tasks, most comparative research restricts experiments to default settings of algorithm parameters and a fixed input representation. We show that (combined) optimization of these two facets of a task may easily overwhelm the effect of 'algorithm bias'. As a case study, I will focus on recent work in our group on memory-based word sense disambiguation.

Caroline Lyon (Hertfordshire) 6 December 2001
A Statistical Method of Fingerprinting Text

In a large collection of independently written documents each text is associated with a fingerprint which should be different from all the others. If fingerprints are too close, then it is suspected that passages of copied or similar text occur in two documents. The method I shall describe (and demonstrate) exploits the characteristic distribution of word trigrams. Measures to determine similarity are based on set theoretic principles. Short similar sections of text can be detected as well as those that are identical.

This method is the core of a plagiarism detector, with a graphical interface, used to assess the work of students in very large classes.

The development of this system was a spin off from research in speech recognition. The sparse data problem in language modelling can be turned on its head and put to good use to fingerprint text.

Bill Keller (Sussex) 22 November 2001
Web of Words

The successful application of NL technology to many key tasks (e.g. sense disambiguation, parse selection, information extraction, text retrieval) is crucially dependent on lexical knowledge.Currently, the size of the text corpora used to gather lexical data is on the order of 100 million to 1 billion words (i.e. word tokens). Even with corpora of this size however, there is insufficient data to gather accurate information about the behaviour of anything but the most common words.

This talk will describe current work in progress (jointly with Rudi Lutz and David Weir) on overcoming the sparse data problem through use of the world wide web as a 'virtual corpus'. In this work, the web is being processed in a 'targeted way, to gather grammatical dependency information for relatively uncommon words. While the idea of mining the web for lexical data is appealing, it throws up many novel problems, of both practical and theoretical interest. Issues to be discussed include, for example, how to get at the data you want, how to avoid data you don't want, how much data you need, and the 'size' of the web as virtual corpus.

Judita Preiss (Cambridge) 20 November 2001
A Detailed Comparison of WSD Systems: An Analysis of the System Answers for the SENSEVAL-2 English All Words Task

We compare the word sense disambiguation systems submitted for the English-all-words task in SENSEVAL-2. We give several performance measures for the systems,and analyse correlations between system accuracy and word features. A decision tree learning algorithm is employed to find the situations in which systems perform particularly well, and the resulting decision tree is examined.

Adam Kilgarriff (Brighton) 15 November 2001
Lexicographers for Word Sense Disambiguation and Vice Versa

We present a novel approach to the task of Word Sense Disambiguation (WSD), which also provides a semi-automatic environment for a lexicographer to compose dictionary entries based on corpus evidence. For WSD, involving lexicographers tackles the twin obstacles to high accuracy: lack of training data and insufficiently explicit dictionaries. For lexicographers, the computational environment meets a long-held desire for a corpus workbench which supports WSD. We present results under simulated lexicographic use on the SENSEVAL test that show performance comparable with the best existing systems, without using laboriously-prepared training data.

Mark McLauchlan (Sussex) 8 November 2001
More Maximum Entropy Models for Prepositional Phrase Attachment

Prepositional phrase attachment is a common source of structural ambiguity, and resolving this ambiguity is a popular task in NLP research. Several different machine learning approaches have reached accuracy rates of around 84.5% on the benchmark dataset (the usual lower bound is 73% and human performance is 88%). But the best published result using maximum entropy (maxent) modelling is a slightly disappointing 83.7%. In this talk I will describe experiments showing that maxent can easily match the accuracy of other techniques. Better results can be obtained by including additional features, in particular features created using Latent Semantic Analysis (LSA). LSA is an unsupervised technique based on simple word co-occurrences for measuring word similarity and perhaps capturing some basic semantics. Results with this expanded feature set exceed all other approaches that use just the benchmark dataset and unsupervised data (that is, without using WordNet).

John Carroll (Sussex) 10 October 2001
High Precision Extraction of Grammatical Relations

Head-dependent relationships (or grammatical relations) have been advocated as a useful level of representation for grammatical structure in a number of different large-scale language-processing tasks and applications.

A parsing system returning analyses in the form of sets of grammatical relations can obtain high precision if it hypothesises a particular relation only when it is certain that the relation is correct. We operationalise this technique - in a statistical parser using a manually-developed wide-coverage grammar of English - by only returning relations that form part of all analyses licensed by the grammar. We observe an increase in precision from 75% to over 90% (at the cost of a reduction in recall) on a test corpus of naturally-occurring text.

[Practice talk for IWPT-2001]

John Carroll (Sussex) 18 September 2001
Statistical Natural Language Processing in COGS

The Natural Language and Computational Linguistics group in COGS, University of Sussex, is one of the largest groups in the UK of researchers focusing on statistical and corpus-based techniques for computerised processing of human language. I will outline some of the work we are carrying out into probabilistic and robust syntactic parsing, automatic acquisition of information about word usage from large text corpora, and empirical foundations of language processing.

Geoffrey K. Pullum & Barbara C. Scholz (University of California, Santa Cruz and San Jose State University) 19 July 2001
Separating Model-theoretic Syntax from Generative Syntax

Two kinds of framework for stating grammars of natural languages emerged during the 20th century. Here we call them generative-enumerative syntax (GES) and model-theoretic syntax (MTS). They are based on very different mathematics. GES developed in the 1950s out of Post's work on the syntactic side of logic. MTS arose somewhat later out of the semantic side of logic. We identify some distinguishing theoretical features of these frameworks, relating to cardinality of the set of expressions, size of individual expressions, and 'transderivational constraints'. We then turn to three kinds of linguistic phenomena: partial grammaticality, the syntactic properties of expression fragments, and the fact that the lexicon of any natural language is in constant flux, and conclude that MTS has some major (even dramatic) advantages for linguistic description that have been almost entirely overlooked. The issue of what natural languages actually are in MTS terms is also briefly addressed, and even more briefly, implications for parsing and acquisition.

Gerald Gazdar (Sussex) 28 June 2001
Applicability of Indexed Grammars to Natural Languages: The Missing Section on Kayardild

The published version of Gazdar (1988) discusses the applicability of indexed grammars to natural languages in the light of data from English, Belgian and Norwedish. However, the crucial Kayardild section is missing from the published paper. In this talk, I will present the contents of this section of the paper. Specifically, I will cover the syntax and inflectional morphology of a hypothetical language ('English with Kayardild characteristics') and consider the relevant formal implications of the existence of such a language.

G. Gazdar (1988) Applicability of indexed grammars to natural languages. In U. Reyle amps; C. Rohrer, eds. Natural Language Parsing and Linguistic Theories. Dordrecht: Reidel, 69-94.

Steve Whittaker (AT&T Labs-Research, USA) 7 June 2001
ContactMap: Using Social Networks to Support Communication Management

Most of us are now using multiple communication technologies extensively in our professional and personal lives. I will present data detailing the set of problems that people have with communication management, such as remembering contact information for the people we communicate with, tracking the progress of ongoing conversations and 'keeping in touch'. I will describe a novel system ContactMap that extracts social networks from communication records (e.g. email). I will describe how visualisations of one's personal network can be used to address the above problems of contact management, and talk about some early experiments documenting the nature and structure of these extracted networks.

[Hosted jointly with the COGS HCT group].

Geoffrey Sampson (Sussex) 31 May 2001
Structural Data on the Acquisition of Writing Skills

Children arrive at British schools as (in most cases) fluent speakers of English; one of the objects of schooling is to enable them to exploit the structural norms of the written language, which in many respects are rather different. To explore the trajectory by which skilled English speakers become skilled writers of English, we need corpora, structurally annotated on the same system, of (i) conversational spoken English, (ii) 'polished' written English, and (iii) the written output of children. Our current LUCY project and recent CHRISTINE project have been generating such resources. This paper presents some preliminary statistical analyses of structural similarities and differences between these three genres of English in the present-day UK, with tentative inferences about the process of learning to write.

Ann Copestake (Cambridge) 24 May 2001
Multi-word Expressions

Even the best existing formal grammars of natural languages generate a large proportion of utterances which sound stilted, ugly or simply wrong to native speakers. Many of the problems are due to properties of multi-word expressions: that is, phrases which are not entirely predictable on the basis of standard grammar rules and lexical entries. To address this, we are beginning to develop a theoretical and computational account of such expressions, which include idioms, collocations, compound nouns and verb particle constructions as well as a number of irregular phrases which are difficult to classify. The aim is to develop a wide-coverage computational lexicon for English which will be usable with the existing LKB parser/generator and the LinGO English grammar. We expect to make extensive use of corpora both as an aid to development of the formal theory and as a source of lexical information. I will discuss some of the different properties of multi-word expressions, explain why they are a challenge to grammar writers and outline how we intend to represent them formally within this architecture. I'll also describe some ways in which we think studying comparable expressions in other languages will be useful.

This work is a collaboration between researchers at CSLI, NTT and the University of Cambridge Computer Laboratory.

Stephen Clark (Edinburgh) 10 May 2001
Class-Based Probability Estimation using a Semantic Hierarchy

This talk is about the automatic acquisition of a particular kind of lexical knowledge, namely the knowledge of which noun senses can fill the argument slots of predicates. The knowledge is represented using probabilities, which agrees with the intuition that there are no absolute constraints on the arguments of predicates, but that the constraints are satisfied to a certain degree; thus the problem of knowledge acquisition becomes the problem of probability estimation from corpus data. The problem with defining a probability model in terms of senses is that this involves a huge number of parameters, which results in a sparse data problem. The proposal is to define a probability model over senses in a semantic hierarchy, and exploit the fact that senses can be grouped into classes consisting of semantically similar senses.

A novel class-based estimation technique is developed, together with a procedure that determines a suitable class for a sense (given a predicate and argument position). The problem of determining a suitable class can be thought of as finding a suitable level of generalisation in the hierarchy. The generalisation procedure uses a statistical test to locate areas consisting of semantically similar senses, and, as well as being used for probability estimation, is also employed as part of a re-estimation algorithm for estimating sense frequencies from incomplete data.

The estimation technique is compared with two alternatives on a pseudo disambiguation task, with favourable results. One alternative uses MDL to determine a level of generalisation, and the other uses Resnik's association score. I will also discuss a novel result regarding the use of the chi-squared test in corpus based NLP.

Darren Pearce (Sussex) 26 April 2001
Social Butterflies, Dark Horses and Busy Bees: Collocation Extraction using Lexical Substitution Tests

'Social butterfly', 'dark horse' and 'busy bee' are all commonly used phrases where the natural interpretation is metaphorical rather than literal. According to many researchers, such phrases are examples of 'collocations'. The concept of a collocation, however, is not well-defined and varies significantly from one researcher to the next.

There have been several previous attempts to automatically extract collocations from large bodies of text. Knowledge of collocations is useful in a variety of contexts including machine translation, natural language generation and also in the teaching of English as a foreign language.

I will begin with a brief discussion of sample phrases of several types ranging from the semantically compositional to the metaphorical and idiomatic. I will then describe the underlying rationale and formalisation of a new approach to collocation extraction. This new technique is based on the limitations on the possible lexical substitutions that can be made within candidate phrases. For contrast, I will also discuss the relationship between this approach and existing extraction techniques. Finally, I will detail the results and evaluation of an experiment that attempts to extract 'animal collocations' (such as those in the title of the talk).

Tony McEnery (Lancaster) 15 March 2001
EMILLE: Building a corpus of South Asian languages

The talk describes the goals of the EMILLE (Enabling Minority Language Engineering) Project at the Universities of Lancaster and Sheffield. Building on the findings of MILLE (the Minority Languages Engineering project), EMILLE is focusing upon problems of translating 8-bit language data into Unicode, and is working towards a solution based around the LE (language engineering) architecture GATE. A description of ongoing work on constructing the 63 million word EMILLE corpus of spoken and written data will also given. Our goal is to provide the basic architecture and data required to encourage research into South Asian language engineering. In particular, this will support the development of translation systems and translation tools which will be of direct use to translators dealing with languages such as Bengali, Hindi and Panjabi both in the UK and internationally.

Miles Osborne (Edinburgh) 1 March 2001
Just how good is maximum entropy? An empirical investigation using ensembles of MEMD models for attribute-value grammars

Maximum entropy has been theoretically argued as being the principled way to estimate models that are only partially determined by some set of empirically observed constraints. However, such arguments hinge upon large sample behaviour, and it is unclear how well maximum entropy performs when this assumption is violated by small samples. Within the maximum entropy / minimum divergence (MEMD) framework, and when operating in the domain of parse selection, we estimate lower and upper bounds on the performance of such models. Maximum entropy, even when samples are small, is shown to produce models near the upper bound. In addition to prediction using single models, we also investigate how well maximum entropy compares with ensembles of MEMD models. Maximum entropy is found to be competitive with such ensembles. Since ensemble learning requires substantially more computational resources than single model learning, yet delivers similar results to maximum entropy, this is a useful finding.

John Carroll (Sussex) 15 January 2001
Simplifying Texts for Aphasic Readers

[Falmer Language Group Talk]

Mats Rooth (Cornell) 4 January 2001
Governor Markup in Parse Forests


Ted Briscoe (Cambridge) 8 December 2000
An Evolutionary Approach to (Logistic-like) Language Change

Niyogi and Berwick have developed a deterministic dynamical model of language change from which they analytically derive logistic, S-shaped spread of a linguistic variant through a speech community given certain assumptions about the language learning procedure, the linguistic environment, and so forth. I will demonstrate that the same assumptions embedded in a stochastic model of language change lead to different and sometimes counterintuitive predictions. I will go on to argue that stochastic models are more appropriate and can support greater demographic and (psycho)linguistic realism, leading to more insightful accounts of the (putative) growth rates of attested changes.

Zoe Lock (DERA, Malvern) 7 December 2000
Induction and Evaluation of Semantic Networks Using Inductive Logic Programming

It is perhaps surprising that despite the widespread employment of semantic networks in many fields, there is little discussion about automatically learning them from data sets. This talk will be based around the description of a system that has been designed to induce both monotonic and non-monotonic semantic networks using the methodology of Inductive Logic Programming.

Like all machine learning tasks, some form of evaluation must be used to choose between competing networks. Discussion around this problem in particular is severely lacking in the literature and it hoped that some of the important issues here are reflected in the talk.

Alexander Clark (Sussex) 1 December 2000
Pair Hidden Markov Models and Morphology Acquisition

In this talk I present Pair Hidden Markov Models, a statistical model used in bioinformatics, together with associated algorithms. They are a natural extension of Hidden Markov Models and can be thought of as non-deterministic stochastic finite-state transducers. I show how they can be used to model morphological and morphophonological processes, and can acquire morphological rules in a variety of languages including those with non-concatenative morphology such as Arabic. I will give examples from a variety of languages and discuss some further applications and extensions of these models.

Harald Baayen (Max Planck Institute, Nijmegen) 16 November 2000
Applied Word Frequency Statistics: Morphological Mixtures, Year Distributions, and Emerging Vogue-affixes

Word frequency distributions are characterized by the presence of large numbers of rare words. Khmaladze (1987) describes distributions with a Large Number of Rare Events (LNRE distributions) as distributions for which the law of large numbers does not hold. Several statistical models which take the LNRE property into account in a principled way are available. In my presentation, I will begin with a non-technical introduction to LNRE modeling. I will then discuss three case studies in which LNRE models are applied.

The first case study considers the productivity of a Dutch derivational suffix, -heid, a suffix that, like -ness in English, is used to form abstract nouns from adjectives. I will argue that a proper understanding of the productivity of this suffix requires a mixture model with two LNRE component distributions.

The second case study addresses the distribution of references to the past in three newspapers. Frequency distributions of such references are characterized by discontinuities that probably reflect our cultural memory span. These discontinuities pose an interesting problem for the theory of LNRE distributions. LNRE models are generally fitted to the data by means of the numbers of types with the lowest frequencies of occurrence, which raises the question whether LNRE models have anything to say about the highest-frequency words and discontinuities among the higher-frequency words. I will argue that the general downward curvature observable at the head of many rank-frequency distributions can be captured by LNRE models once we allow for discretization of the expected numbers of high-frequency types. At the same time, the discontinuities of year distributions require additional breakpoint analyses.

The third case study discusses three vogue affixes in British English, -mock, cod-, and faux-. A diachronic survey of the use of these suffixes in 'The Independent' in the period 1989--1998 suggests that both cod- and faux- became somewhat more productive around 1994. Breakpoint analyses provide a first means for coming to grips with these data. However, for a proper statistical analysis, LNRE models with dynamic rather than static populations will have to be developed.


Harald Baayen (2001). Word frequency distributions. Kluwer Academic Publishers, Dordrecht (to appear).

Estate Khmaladze (1987). The statistical analysis of large number of rare events. Technical Report MS-R8804, Dept. of Mathematical Statistics, CWI. Amsterdam: Center for Mathematics and Computer Science.

Thijs Polman and Harald Baayen (2000). Computing historical consciousness. A quantitative inquiry into the presence of the past in newspaper texts. To appear in Computers and the Humanities.

Julie Weeds (Sussex) 2 November 2000
Semi-Automatic Extraction of a Semantic Hierarchy from a Machine Readable Dictionary for Use in Word Sense Disambiguation Tasks

In this talk I will discuss how a semantic hierarchy of noun senses can be semi-automatically extracted from a machine-readable dictionary. I will also present a word sense disambiguation technique which disambiguates between senses in the machine-readable dictionary CIDE+ (Cambridge International Dictionary of English). The technique uses a semantic hierarchy to measure semantic relatedness of two senses and evaluates potential sense configurations according to this measure of semantic relatedness. I will evaluate the performance of the technique using an existing hierarchy (WordNet) and using a hierarchy extracted from CIDE+.

Roger Evans & Gerald Gazdar (Brighton and Sussex) 26 October 2000
DATR II -- New Wine in an Old Bottle

We will present a generalization of DATR to support the inclusion of probability-like information in DATR definitions. More precisely, we extend DATR to allow multiple equations with the same LHS, where each equation is associated with an element from an algebra, sets of same-LHS equations are subject to an algebraic constraint, and computation of DATR values involves parallel algebaric manipulation of such elements. This generalization leads to a class of DATR-like languages parameterised by choice of algebra.

We will exemplify using the case where the algebra of standard probability theory is chosen. Many standard tools of probabilistic NLP (CF-PSG, HMM, P-FST, etc.) can be very simply and intuitively encoded given this choice of algebra. And 'traditional DATR' can be recaptured by requiring each equation to have probability 1. The addition of an unsurprising notational convention leaves (traditional) DATR exactly as it was -- same syntax, same semantics.

We will also discuss a number of other possible algebras which give rise to some near neighbours and some distant cousins of traditional probabilistic models.

Anne de Roeck (Essex) 12 October 2000
Morphologically Sensitive Clustering for Identifying Arabic Roots

I present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for information retrieval. Our two-stage algorithm applies light stemming, before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes, and accurate clustering to up to 94% for unedited Arabic text samples, without the use of dictionaries.

Adam Kilgarriff (Brighton) 6 July 2000
English senseval: Results from senseval-1, Design Decisions for senseval-2

There are now many computer programs for automatically determining which sense a word is being used in. One would like to be able to say which were better, which worse, and also which words, or varieties of language, presented particular problems to which programs. In 1998 a first evaluation exercise, \textsc{senseval, took place. The English component of the exercise is described, and results presented. A second \textsc{senseval is currently being planned and the talk will discuss where we are now on the planning.

Abdelhadi Soudi (Ecole Nationale de L'Industrie Minerale, Rabat, Morocco) 8 June 2000
Arabic Morphology Generation Using a Concatenative Strategy

Arabic inflectional morphology requires infixation, prefixation and suffixation, giving rise to a large space of morphological variation. In this talk, we describe an approach to reducing the complexity of Arabic morphology generation using discrimination trees and transformational rules. By decoupling the problem of stem changes from that of prefixes and suffixes, we gain a significant reduction in the number of rules required, as much as a factor of three for certain verb types. We focus on hollow verbs but discuss the wider applicability of the approach.

Rob Malouf (Groningen) 2 June 2000
The Order of Prenominal Adjectives in Natural Language Generation

The order of prenominal adjectival modifiers in English is governed by complex and dificult to describe constraints which straddle the boundary between competence and performance. I will describe and compare a number of statistical and machine learning techniques for ordering sequences of adjectives in the context of a natural language generation system.

Diana McCarthy (Sussex) 13 April 2000
Using Semantic Preferences to Identify Verbal Participation in Role Switching Alternations

We propose a method for identifying diathesis alternations where a particular argument type is seen in slots which have different grammatical roles in the alternating forms. The method uses selectional preferences acquired as probability distributions over WordNet. Preferences for the target slots are compared using a measure of distributional similarity. The method is evaluated on the causative and conative alternations, but is generally applicable and does not require a priori knowledge specific to the alternation.

Is Hypothesis Testing Useful for Subcategorization Acquisition?

Statistical filtering is often used to remove noise from automatically acquired subcategorization frames. We compare three different approaches to filtering out spurious hypotheses. Two hypothesis tests perform poorly, compared to filtering frames on the basis of relative frequency. We discuss reasons for this and consider directions for future research. (Joint work with Anna Korhonen and Gen Gorrell).

David Tugwell (Brighton) 16 March 2000
Against Syntactic Structure

It is standardly assumed that a generative grammar must incorporate some notion of syntactic structure, based either on syntactic constituency or syntactic dependencies. In this talk I will argue that this assumption is unfounded and indeed leads to unnecessary complication in our grammars. As an alternative I present a 'left-to-right' model of syntax that characterises the way in which each word in a string can add information to an incrementally growing representation of the semantic/conceptual structure of the sentence. I will argue that such a model is well-suited to practical applications, in particular the construction of a statistical language model, and I will also look at a number of syntactic constructions that appear to provide support for it.

Alexander Clark (Sussex) 9 March 2000
Syntactic Categories: Induction by Distributional Clustering

The argument from the poverty of the stimulus is one of the main arguments for the innateness of language hypothesised by Chomsky, Pinker et al. It is subject to direct refutation by exhibiting an algorithm that can learn linguistic structure from small amounts of unannotated data.

Almost all such algorithms will start by inducing a set of syntactic categories: a variety of techniques for doing this have already been proposed. I will examine those of Hinrich Schutze and of Brown, della Pietra, et al. and propose a variation using an iterative technique to cluster based directly on the divergence of their context distributions, and present some preliminary results.

I will then discuss some possible applications, and directions for future research.

(This talk describes work in progress).

Antoinette Renouf (Liverpool) 2 March 2000
Detecting Meaning and Change in the Language

Automated linguistic study, whether for scholarly descriptive purposes or for commercial IT applications, has until now been primarily concerned with the analysis of text as a static entity. This has been due to a combination of practical and cultural constraints. An exception has been the work of my Unit, which has always viewed the language as a changing phenomenon, and has devoted much of the last ten years to developing systems for analysing text as a diachronic, chronological entity.

In this presentation, I shall outline some of our work in the detection of meaning, new words, new meanings of existing words, changes in semantic relationships, and changes in the structure of the lexicon. Since I am a linguist, the emphasis will be on the facts of the language which underpin our general approach.

Rens Bod (Leeds) 27 January 2000
Parsing with the Shortest Derivation

Common wisdom has it that the bias of stochastic grammars in favor of shorter derivations is harmful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees rather than context-free rewrite rules, such as stochastic tree-substitution grammars used by data-oriented parsing models. For such grammars, a non-probabilistic metric based on the shortest derivation outperforms a probabilistic metric based on the most probable parse for the ATIS, OVIS and Wall Street Journal corpora. We argue that any stochastic grammar which uses elementary trees of flexible size can be turned into an effective non-probabilistic version. Finally, we'll go into some of the conceptual ideas behind this approach and propose a model for sentence parsing as the shortest combination of chunks of previous parses.

Stephen Clark (Sussex) 9 December 1999
A Class-based Probabilistic Approach to Structural Disambiguation

Knowledge of which words are able to fill particular argument slots of a predicate can be used for structural disambiguation. In this talk I will describe a proposal for acquiring such knowledge, and in line with much of the recent work in this area, a probabilistic approach is taken. I have developed a novel way of using a semantic hierarchy to estimate the probabilities, and demonstrate the general approach using a prepositional phrase attachment experiment.

Rudi Lutz (Sussex) 4 November 1999
An Evolutionary Approach to Learning Stochastic Context-Free Grammars, and a Comparison with the Inside/Outside Algorithm


Suresh Manandhar (York) 7 October 1999
Morphology Learning using ILP

In this talk I will show how the evolving paradigm of inductive logic programming (ILP) can be applied for learning of morphology. ILP is a relatively new machine learning paradigm which allows learning of Horn/Prolog clauses. From a NLP learning perspective, ILP can be viewed as a non-statistical/symbolic method for learning human readable logic programs. We apply the CLOG decision-list learning system for supervised learning of morphological analysis rules. We then show how word segmentation rules can be learnt in an unsupervised setting from a raw list of words using a hybrird genetic algorithm plus ILP combination. We finally show that our word segmentation rules can be successfully employed for tag prediction of unknown words.

Stephan Oepen (Saarbruecken) 15 July 1999
Towards Systematic Competence and Performance Profiling

Contemporary lexicalized constraint-based grammars (e.g. implementations within the HPSG framework) with wide grammatical and lexical coverage exhibit a large conceptual and computational complexity. Since (worst case) complexity theory accounts cannot accurately predict the practical behaviour of parsers and generators based on unification grammars, system developers and grammar writers alike need to rely on emprical data in diagnosis and evaluation.

At the same time, there is little existing methodology (nor available reference data and tools) to facilitate empirical assessment and progress evaluation as part of the regular development cycle. Hence, isolated case studies, introspection, and intuitions still play a crucial role in typical large-scale development efforts. Yet, subtle decisions in the system implementation or unexpected interaction within the grammar can have drastic effects on the overall system performance.

The talk presents a new methodology that makes the precise and systematic empirical study of system competence and performance a focal point in system and grammar development. This approach can be seen as an adaption of the profiling metaphor (known from software development) to constraint-based language processing systems. Based on (i) a set of structured reference data (taken from both existing test suites and corpora), (ii) a uniform data model for test data and processing results, and (iii) a specialized profiling tool, developers are enabled to obtain an accurate snapshot of current system behaviour (a profile) with minimal effort. Profiles can then be analysed and visualized at highly variable granularity, reflecting different aspects of system competence and performance. Since profiles are stored in a database, comparison to earlier versions or among different parameter settings is straightforward.

The profiling methodology and tool was developed in close cooperation with grammar and system development efforts at CSLI Stanford and DFKI Saarbruecken. The presentation will discuss recent results from the comparison among different parsing strategies and hopefully include a live demonstration.

Christy Doran (ITRI, Brighton) 14 June 1999
Evolution of the XTAG English Grammar

The XTAG Project has been ongoing at the University of Pennsylvania in some form or another since 1987. It began with a toy grammar run on LISP machines, and has since developed into a full-scale system with a large English grammar, small grammars for several other languages, a sophisticated X-windows based grammar development environment and numerous satellite tools.

Extending an LTAG grammar requires expansion of two different types of data: associations of lexical items with syntactic frames and the trees encoding those syntactic frames. This talk will present some of the approaches we have used to both of these tasks in creating the current English Grammar, and will also touch on some of the tools we have developed to make this work more manageable.

John Carroll (Sussex) 3 June 1999
Parsing with an Extended Domain of Locality

One of the claimed benefits of Tree Adjoining Grammars is that they have an extended domain of locality (EDOL). We consider how it can be exploited to limit the need for feature structure unification during parsing. We compare two wide-coverage lexicalized grammars of English, LEXSYS and XTAG, finding that the two grammars exploit the EDOL in different ways. (Joint work with David Weir, Nicolas Nicolov, Olga Shaumyan, and Martine Smets).

Geoff Sampson (Sussex) 27 May 1999
Extending Grammar Annotation Standards to Spontaneous Speech

I examine the problems that arise in extending an explicit, rigorous scheme of grammatical annotation standards for written English into the domain of spontaneous speech. Problems of principle occur in connexion with part-of-speech tagging; the annotation of speech repairs and structurally incoherent speech; logical distinctions dependent on the orthography of written language (the direct/indirect speech distinction); differentiating between nonstandard usage and performance errors; and integrating inaudible wording into analyses of otherwise-clear passages. Perhaps because speech has contributed little in the past to the tradition of philological analysis, it proves difficult in this domain to devise annotation guidelines which permit the analyst to express what is true without forcing him to go beyond the evidence.

Stephen Clark (Sussex) 20 May 1999
An Iterative Approach to Estimating Frequencies over a Semantic Hierarchy

Estimating the frequency with which a word sense appears as a given argument of a verb is problematic given the absence of sense-disambiguated data. The standard approach is to split the count for any noun appearing in the data equally among the alternative senses of the noun. This can lead to inaccurate estimates. We describe a re-estimation process which uses a semantic hierarchy and the accumulated counts of hypernyms of the alternative senses in order to redistribute the count. In order to choose a hypernym for each alternative sense, we employ a novel technique which uses a chi-squared test to measure the homogeneity of sets of concepts in the hierarchy.

Patrice Lopez (LORIA, Nancy, France) 13 May 1999
Connection Driven Parsing for LTAG: Interest and Application

I will present a parsing technique dedicated to LTAG which use a new parsing invariant: in classical parsing algorithms for LTAG an item represents a well recognized sub-tree, here an item represents a well-recognized part of the left-to-right elementary tree traversal. I will show that this kind of parsing focused on islands brings more extended partial parsing compared with classical algorithms based on subtree recognition. These partial results are then combined thanks to additionnal monotonic repairing rules in order to cover ungrammatical phenomena such as spoken disfluencies. Finally I will present the implementation of these propositions in a workbench based on Java and XML to design and use a LTAG grammars in a spoken dialogue system.

Richard Power (ITRI, Brighton) 10 December 1998
Knowledge Editing for Natural Language Generation

For some years, NLG research at the ITRI has focussed on the problem of developing a 'symbolic authoring' system: that is, a system that allows an author (not a knowledge engineer) to define the desired content of a document in a language-independent way, so that subsequently a program can generate versions in many languages, including ones that the author does not know. A major difficulty in building such a system has been to find a way in which authors can formalize the desired meaning without having to learn a logical formalism. As a possible solution we have developed a technique called WYSIWYM (What You See Is What You Meant), in which the author defines the meaning by interacting with a 'feedback text' generated by the program. The feedback text presents the current state of the knowledge, however incomplete, and also indicates, through pop-up menus, the options for adding or removing knowledge. Early WYSIWYM systems had limitations that made them unsuitable for the domain we are currently working on (patient information leaflets): in particular, they allowed the author no control over logical form. In the talk, I will give an introduction to WYSIWYM (including a demonstration), and explain our latest ideas for overcoming these limitations.

Jamie Henderson (Exeter) 19 November 1998
Learning Syntactic Parsing with Simple Synchrony Networks

The ability of connectionist networks to learn about finite patterns is well established and they have been fairly successful at learning about sequences, but connectionist networks have had great difficulty with learning about structures. This has prevented their successful application to some important cognitive processes, such as syntactic parsing. Recent neuroscientific and computational work on representing entities using the synchrony of neural oscillations has gone some way towards addressing the representational aspects of this problem, but it has not addressed the learning issues. Here we apply this representation of entities to a standard connectionist architecture (Simple Recurrent Networks), and demonstrate that the resulting architecture (Simple Synchrony Networks) can learn syntactic parsing. On naturally occurring text we get results approaching those of current statistical methods. In addition, the input-output format for these networks is very similar to some current symbolic grammatical representations, making more integrated connectionist-symbolic hybrid approaches possible.

Geoff Sampson (Sussex) 17 November 1998
Demographic Correlates of Complexity in British Speech

My CHRISTINE project has been producing detailed structural annotations of spontaneous spoken English, including a subset of the demographically sampled material in the spoken section of the British National Corpus. This enables us to investigate whether real-life U.K. speech in the 1990s includes any statistically-significant correlations between structural properties of language, and social variables such as sex, age, region, or social class. I have looked at figures for grammatical 'complexity', in the traditional sense of incidence of clause subordination. There are some effects; they do not altogether correspond to what one might have predicted in advance, and they have relevance for psychological theories of language acquisition as well as social interest.

Jim Blevins (Cambridge) 5 November 1998
Constituent Sharing in Nonconstituent Coordination

This talk treats constituent and nonconstituent coordination as cases of constituent coordination in which conjuncts share a peripheral head. Constituent sharing directly captures the fact that the individual conjuncts in both constructions are independently wellformed phrases, conforming to general constituent structure and order constraints. A subsumption-based treatment of valence demands permits a shared head to combine independently with its arguments in each conjunct and avoids the feature conflicts which, as Ingria 1990 shows, arise on accounts that use unification to regulate valence demands.

Gerald Gazdar (Sussex) 22 October 1998
The Fractal Lexicon

Modern lexicons for NLP are full of numbers (typically probabilities). These numbers are standardly gleaned from corpora or treebanks by some combination of counting and estimation. The counting is laborious, expensive, and sometimes very difficult. The estimation is usually very weak (in that it makes few assumptions about the structure of the set of numbers being estimated).

This is not a happy state of affairs. It arises because, although NLP researchers have few qualms about adopting strong theories about the objects being counted (as in a taxonomy of parts of speech, for example), they rarely subscribe to any theory about the count itself. The field needs at least one such theory. Then (strong) estimation could do most of the work of delivering the numbers and research could shift to the problem of finding optimal values for the parameters of the theory, given a corpus. I shall argue that we already have such a theory available to us and give some evidence for its plausibility, applicability, and generality.

Geoffrey Nunberg (Xerox PARC and Stanford University) 7 October 1998
Does Cyberspace have Boundaries?

With the advent of the Web and wide-scale Internet use, discussions of the future of electronic discourse have moved from the purely speculative 'just imagines' of few years ago to more pointed critical questions about how electronic communication will mediate public discourse. In particular, there has been a sudden increase of concern about the difficulties of determining the center and margins of electronic discourse -- a concern, that is, about the uncontrolled circulation of 'bad information,' whether in the form of stock-market rumors, racist tracts, pornography, misleading advertising, or just plain hokum. With this comes a sense of the Net as a chaotic domain fragmented into an inderminate number of subdiscourses, where it is impossible to separate private from public, official from unofficial, reliable from unreliable, and so forth.

Anxieties like there aren't new: they're situated in a history of complaints about the fragmentation of public discourse that runs from Pope to Johnson to Carlyle to Dewey, among many others. But the sense of decentering of electronic discourse does have a distinct material basis: cyberspace doesn't lend itself to the same kinds of spatializations that have shaped the way we organize print discourse. The question is, how much of the spatial order of print will be reproduced in cyberspace, and what kinds of developments -- social, economic, and technological -- will determine the shape of the new civic discourse?

[Distinguished Lecture in AI and Cognitive Science jointly sponsored by the University of Brighton (ITRI) and the University of Sussex (COGS)].

Dafydd Gibbon (Bielefeld) 21 September 1998
Prosodic Inheritance and Phonetic Interpretation: Lexical Tone

We describe a compositional inheritance based approach to the phonetic interpretation of lexical tone, using tone sandhi automata. The automata have several advantages over conventional representations in mainstream phonology, both in providing clear configurational explications of notions such as 'tone terrace', 'downstep', 'downdrift', 'upsweep', and in providing an obvious basis for clean computational models and efficient implementations. We present pitch interpretation automata for two Kwa languages and one Gur language, and a generalised automaton for typological comparisons, and show, with reference to Gbe (Ewe) how the model relates to the Liberman and Pierrehumbert model of pitch interpretation for intonation. Further, we show that the weakly equivalent regular expression oriented notations for regular languages, which have become popular in computational phonology, are quite unperspicuous in this domain. The pitch generator automata are implemented in DATR.

Anja Belz (Sussex) 21 May 1998
Learning Phonotactic Constraints with a Genetic Algorithm

In this talk I will present a formal approach to the automatic construction of phonotactic descriptions for given data sets, and a genetic-search method for automatically discovering finite-state automata that encode such descriptions. The method is equally suitable for discovering infinite and finite regular languages, either from complete or incomplete presentation of positive data. The degree of generalisation from a given incomplete data set to a superset can be controlled by a set of parameters. I will present results for data sets derived from existing formal descriptions of phonological phenomena and syllable phonotactics, and for data sets of Russian nouns.

Thorsten Gerdsmeier (Essex) 30 April 1998
Semantics for DATR

In my diploma dissertation I tried out three semantics for DATR: Natural Semantics (developed by Kahn), Denotational Semantics and Action Semantics. The aim was to see how well the formal semantic descriptions of DATR could achieve certain goals. Four of these goals were completeness, correctness, conciseness and intuitive plausibility. Conciseness and plausibility can be judged from the paper descriptions, but judgements of completeness and correctness are best made if a semantics definition can be implemented and test programs can thereby be interpreted. I used the theorem prover Isabelle to check heuristically the completeness and correctness of a Natural Semantics of DATR. I shall describe mostly the Natural Semantics and its implementation.

Mary McGee-Wood (Manchester) 6 March 1998
Redundancy Rules in Categorial Grammar: Theory and Practice

Categorial Grammars have always included both binary rules (such as function application and function composition) and unary (type-shifting) rules, and indeed the interactions between these two rule types have been involved in many debates within CG. The unary rules, however, have been restricted to those which preserved algebraic identity (rewriting NP as S/(S\\NP), for example, does not in itself affect the descriptive power of the grammar). I will offer an argument in principle for the adoption of non-algebra- preserving unary rules in a CG, comparable to "lexical redundancy rules", and support this by looking in detail at the compaction which can be achieved in the CG "Large Lexicon" developed in the Institute for Research in Cognitive Science at the University of Pennsylvania. I might conclude with a few words on the roles of purism and eclecticism in the development of linguistic theories and of NLP systems.

Guido Minnen (Sussex) 12 February 1998
A Computational Treatment of HPSG Lexical Rules as Systematic Covariation in Lexical Entries

I propose a computational treatment of HPSG lexical rules which was developed together with Detmar Meurers (University of Tuebingen). A compiler is described which translates a set of lexical rules and their interaction into a definite clause encoding, which is called by the base lexical entries in the lexicon. This way, the disjunctive possibilities arising from lexical rule application as systematic covariation is encoded in the specification of lexical entries. The compiler ensures the automatic transfer of properties not changed by a lexical rule. Program transformation techniques are used to advance the encoding. The final output of the compiler constitutes an efficient computational counterpart of the linguistic generalizations captured by lexical rules and allows 'on the fly' application of lexical rules.

David Elworthy (Canon Research Centre Europe, UK) 5 February 1998
CRE's Analyser: A Parser for Dependency Grammars

In this talk, I will give an overview of the analyser developed at Canon Research Centre Europe for parsing dependency grammars. The analyser is intended to work incrementally and to be tolerant of ungrammatical input. Throughout its development, we have tried to avoid getting caught up in the coils of linguistic theory - the motivation of the work is providing practical solutions to technological problems. I will first describe the older version of the analyser, and then look at our new version, which attempts to overcome some problems with the earlier version. The current analyser is implemented in Haskel, a lazy functional language, and I will suggest that using laziness carefully means a reduction in the importance of theoretical complexity results.

John Carroll (Sussex) 29 January 1998
PSET - Practical Simplification of English Newspaper Text for Aphasic Readers

Aphasia is a disability of language processing often suffered by people as a result of a stroke or head injury. The PSET project is a recently-started collaborative project between COGS and the University of Sunderland to build and evaluate a computer system to simplify newspaper text for aphasic readers. The system under construction will use an extensive grammar, a robust parser and large computerised dictionaries and thesauri to remove passives, unusual vocabulary and other constructs and forms with which aphasics often have difficulty. The talk will focus on underlying linguistic assumptions, architectural issues, and a review of the lessons learned to date.

Mark Stevenson (Sheffield) 15 January 1998
Multiple Knowledge Sources for Sense Tagging

Many different knowledge sources, or disambiguators, have been used for word sense disambiguation (collocations, dictionary definitions, thesaural hierarchies etc.) These different knowledge sources can be categorised into several different types, syntactic, semantic and pragmatic. However, there has been little research carried out regarding the relative degree of success of each of these knowledge types. It also seems intuitively obvious that word sense disambiguation could be more effective if algorithms have access to several of these knowledge types. We present a sense tagger which makes use of several knowledge sources and optimises their combination using the machine learning technique of decision lists. The results from the tagger are analysed to determine the effectiveness of the various knowledge sources.

Roger Evans and David Weir (Brighton and Sussex) 12 December 1997
Automaton-based Parsing for Lexicalised Tree Adjoining Grammars - More Developments

This talk will present joint work with David on optimised parsing of large tree-adjoining grammars. This is a topic that David touched on briefly in his ITRI presentation last term, and which we reported more fully at the International Workshop on Parsing Technologies in Boston in September. I shall start by giving most of that IWPT talk (which only John Carroll has heard before!) and then describe some of our more recent thoughts in this area (the latter being still work in progress, aiming for an ACL paper).

So the main points of the talk will be: introducing automaton-based parsing; showing how the automaton view allows for parser optimisation; illustrating this by determinising automata for a typical LTAG tree family; looking briefly at alternative control strategies for parsing; exploring the additional problems of trying to minimise the automata; looking at some possible complexity questions (but not answering them very well yet).

Shimon Edelman (Sussex) 11 December 1997
Similarity-based Word Sense Disambiguation

We describe a method for automatic word sense disambiguation using a text corpus and a machine-readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance.

[Joint work with Yael Karov.]

Henry Thompson (Edinburgh) 30 October 1997
DTDs, Schemata and Object-Oriented Document Grammar: Bringing SGML into the 20th century

Two recent proposals for meta-applications of XML (XML-Data and MCF) have included DTD fragments for describing document structure, sometimes called 'schemata'. In this seminar I will introduce the XML-Data schemata proposal, concentrating on the motivation for and nature of the provision of an element-type hierarchy, in which element types can inherit attribute declarations and positions in content models from ancestors in the hierarchy. I will draw a perhaps contentious parallel with the use of meta-rules in grammar formalisms to argue that this is a GOOD THING, and should be preferred to ad-hoc approaches to inheritance using parameter entities.

Note that although this talk will assume some familiarity with SGML, NO prior knowledge of XML or XML-Data is required.

John Carroll and David Weir (Sussex) 17 October 1997
Encoding Frequency Information in Lexicalized Grammars

We address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework. We consider systematically a number of alternative probabilistic frameworks, evaluating their adequacy from both a theoretical and empirical perspective using data from existing large treebanks. We also propose three orthogonal approaches for backing off probability estimates to cope with the large number of parameters involved. [Talk first given at IWPT'97]

Anja Belz (Sussex) 9 October 1997
Learning Phonological and Morphological Regularities with Neural Network Techniques

Since Rumelhart and McClelland's seminal 1986 paper on learning the past tense of English verbs much research has been devoted to acquiring phonological and morphological knowledge with neural networks.

Many promising results as well as many wildly exaggerated claims later, one thing is still missing from the literature - a clear idea of the types of NL tasks that can / cannot be learnt by neural network methods.

In this talk (which is going to be very much a research in progress kind of affair) I want to discuss the limits of learning NL tasks with neural network techniques, briefly outlining the theoretical limitations of neural network learning and focussing on the factors that in a practical context determine the degree to which NL tasks are learnable by such techniques. (I will use Arabic, German and Russian plural formation, the English past tense, as well as some toy language data as examples.)

Guido Minnen (Tuebingen) 18 August 1997
Goal-directed Bottom-up HPSG Parsing using Magic

I describe an HPSG parser that combines the advantages of dynamic bottom-up interpretation using magic transformation and top-down interpretation. In a grammar preprocessing step user-specified type constraints are transformed using magic to enable their goal-directed bottom-up interpretation. The remaining constraints in the grammar are dealt using an advanced top-down interpreter. Preliminary efficiency results are provided on the basis of the implementation of the described parser as part of the grammar development system ConTroll.

K. Vijay-Shanker (Delaware) 15 July 1997
Using Domination Statements in TAG Parsing


Gabriel Illouz (Sussex) 19 June 1997
Lexical Analysis of a Chronological Corpus

In this talk, I will present my work in progress for my Dphil. My main working hypothesis is that, when looking at the distribution of the lexicon over time, one can separate the lexicon into two main categories: the well-distributed words, constituting the common lexicon (that varies little with time), and the underdispersed words, the contextual lexicon (that does vary with the passage of time). Within the latter, one could then extract clusters of words sharing the same pattern.

According to this hypothesis, I will look at some empirical results from the data, and some recent relevant work in the computational linguistic field.

The particularities of the HDF93 corpus of financial news require a special treatment. (everything in capital letters, broad use of abbreviation, ...). I will present how I deal with these particularities.

[Reference: Baayen R.H., The Effect of Lexical Specialization on the Growth Curve of the Vocabulary, Computational Linguistics, Vol 22/4, 1996]

Robert Gaizauskas (Sheffield) 6 June 1997
Acquiring Grammars from the Penn Treebank

In this talk I shall describe two related projects that have been carried out at Sheffield with the general aim of deriving a grammar from the Penn Treebank II (PTB-II) that would be useful for parsing texts. The first project was largely investigative. A set of programs was developed to extract the grammar underlying the bracketting in the PTB-II and to investigate its properties. In particular we were interested in such properties of the grammar as the number of rules occurring in it, the frequency distribution of rules both individually and by lefthand side category, and the lengths of the righthand sides of the rules. We also investigated the accession rate of grammar rules across the corpus -- how rapidly new rules were being encountered as more text was processed -- with a view to determining how close to 'complete' the grammar in the PTB-II is. Finally, we explored some of the properties of the corpus which are revealed by the grammar, such as the depth of the trees and the relation between sentence length and tree depth.

One of the major discoveries of this work was that the grammar implicit in the PTB-II was very large -- approximately 17,500 rules. As this proved unfeasible for use by any parser we had to hand, the second stage of our work concentrated on techniques for compacting the grammar. I shall discuss several of these, including a straightforward thresholding technique and a more interesting technique based on the iterative removal of rules whose righthand sides can be parsed by the remaining rules. (Joint work with Alex Krotov, Mark Hepple, Yorick Wilks).

Ruth Kempson and Wilfred Meyer Viol (SOAS and Imperial) 22 May 1997
Parsing as Tree Construction in LDS

In the talk we will describe the data structures and the dynamics of a formal parser which incrementally creates an interpretation while traversing an NL string in a left-to-right word-by-word fashion. This parser does not work against the background of some grammar describing a set of acceptable trees, it incorporates grammatical knowledge in the transition rules themselves. The parsing process is formalized within an LDS framework in which labels guide the parser in a goal-directed process of constructing a binary tree decorated with labelled formulas. In the course of the construction process underspecification, connected for instance with scope or pronominals, has to be resolved. As an example, we will discuss the formalisation of long-distance dependencies in terms of underspecified tree structure, and we will argue that, by adopting this dynamic perspective, a family of related phenomena become explicable in a unified way.

John Coleman (Oxford) 15 May 1997
Stochastic Phonological Parsing and Acceptibility

In foundational works of generative phonology it is claimed that subjects can reliably discriminate between a) actual English words (e.g. /brik/), b) possible but non-occurring words (e.g. /blik/) and c) words that could not be English (e.g. /bnik/). Discrimination between b) and c) is attributed to grammatical knowledge, e.g. morpheme structure conditions. My group conducted two experiments to determine subjects' ratings of the acceptability of English-like and un-English non-words. In one study, which I shall describe in the first part of the seminar, nonsense words were constructed which either respected or violated a documented phonotactic constraint, and subjects indicated whether or not the nonsense word could be a possible English word. The total number of "votes" against each word gave a scale of 0 (good) to 12 (bad). The judgements collected in this way were at odds with the predictions of mainstream generative phonology, declarative phonology and Optimality Theory alike in several interesting ways.

In the second part of the seminar I shall report on recent work in collaboration with Janet Pierrehumbert in which we examine the use of a probabilistic phonological parser for words, based on a context free grammar, to model the acceptability data. After obtaining onset and rime frequencies from a machine-readable dictionary, we parsed the nonsense words from the acceptability experiment and compared various methods of scoring the goodness of the parse as a predictor of acceptability. We found that the probability of the worst part is not the best score of acceptability, indicating that classical generative phonology and Optimality Theory are correct at a coarse level, but incorrect in detail, as these approaches do not recognise a mechanism by which the well-formed subparts of an otherwise ill-formed word may redeem the ill-formed parts. We argue that phonotactic constraints are probabilistic descriptions of the lexicon, and that probabilistic generative grammars are demonstrably a more psychologically realistic model of phonological competence than standard generative phonology or Optimality Theory.

Nicolas Nicolov (Sussex) 13 March 1997
Approximate Chart Generation from Non-Hierarchical Representations

In this talk I will present a technique for sentence generation. I argue that the input to generators should have a non-hierarchical nature. This allows one to investigate a more general version of the sentence generation problem where one is not pre-committed to a choice of the syntactically prominent elements in the initial semantics. I also consider that a generator can happen to convey more (or less) information than is originally specified in its semantic input. In order to constrain this approximate matching of the input I impose additional restrictions on the semantics of the generated sentence. The technique provides flexibility to address cases where the entire input cannot be precisely expressed in a single sentence. Thus the generator does not rely on the strategic component having linguistic knowledge. The semantic structure is declaratively related to linguistically motivated syntactic representation. I will also discuss a memoing technique based on a semantically indexed chart. I will try to put the generation system that we have developed (PROTECTOR) in the context of the space of other generators.

Time permitting I might say a few words about the new project that we have started recently in which we try to build a wide coverage parsing system using the same grammar formalism.

Geoffrey Sampson (Sussex) 6 March 1997
Yngve's Depth Hypothesis Revisited: An Informal Seminar

Victor Yngve, more than 30 years ago, classically pointed out that English grammar has a strong bias towards right-branching rather than left-branching structures. This is relevant in a computational context because it might imply that CF approaches cannot be the whole truth. Yngve expressed his generalization in terms of sharp limits on speakers' memory, but others later argued that the situation is more complicated (not all languages have the same biases, for one thing). I re-examined the issue in statistical terms using corpus material, and I now believe I know what the true generalization for English is -- and it is quite optimistic from a computational point of view.

[See article in the March 1997 issue of Journal of Linguistics]

Gerald Gazdar (Sussex) 27 February 1997
Intermediate DATR: A Tutorial Seminar

This event is intended for people in COGS and ITRI who already know some DATR but have a need to know more (e.g., because they are using it for a research project or for PhD work). I'd like it to be more like a seminar than a lecture and hope that some of those who attend will bring particular problems of DATR coding that they would like to discuss. In the absence of such material, I am likely to improvise on topics like the evils of variables; relations between nodes and atoms; the use of FSTs; intuitions about quoted nodes; DAGs in the node, attribute and value domains; the magic of Qnode; and so forth. But I really would prefer it if the agenda of the event was largely driven by those who come, rather than by me (indeed, if you plan to come, and have a topic or example in mind to raise, then feel free to give me advance warning by email).

Diana McCarthy (Sussex) 20 February 1997
Acquiring Selectional Preferences from Corpora using the WordNet Thesaurus

The preferences that verbal predicates have for their arguments provide useful information for a variety of NLP tasks including word-sense and structural disambiguation and anaphora resolution. Additionally they are helpful when examining relationships on the semantics-syntax border. The talk will describe work in progress to reimplement and augment an algorithm devised by Li and Abe which produces selectional preferences for a target verb represented by a "tree cut model" in a given thesaurus. A tree cut model is a list of disjoint classes from the thesaurus along with associated scores representing a degree of preference between the class and the verb. Acquisition of these models is performed on corpus data preprocessed by a shallow parser. The Minimum Description Length Principle is used to select the best model given the data. This is a principle from information theory that makes a compromise between selecting a model which most closely fits the data and selecting as simple a model as possible. Results from the data available so far indicate the need to make some modifications to the thesaurus in addition to putting more work into automatic sense tagging of the data.

Ruslan Mitkov (Wolverhampton) 13 February 1997
Attacking Anaphora on all Fronts

Anaphor resolution is a complicated problem in Natural Language Processing and has attracted the attention of many researchers. Most of the approaches developed so far have been traditional linguistic ones with the exception of a few projects where statistical, machine learning or uncertainty reasoning methods have been proposed. The approaches offered - even if we restrict our attention to pronominal anaphora as we shall do throughout this talk - from the purely syntactic to the highly semantic and pragmatic (or the alternative) provide only a partial treatment of the problem. Given this situation and with a view to achieving greater efficiency, it would be worthwhile to develop a framework which combines various methods to be used selectively depending on the situation.

The talk will outline the approaches to anaphor resolution developed by the speaker. First, he will present an integrated architecture which makes use of traditional linguistic methods (constraints and preferences) and which is supplemented by a Bayesian engine for center tracking to increase the accuracy of resolution: special attention will be paid to the new method for center tracking which he developed to this end. Secondly, a uncertainty reasoning approach will be discussed: the idea behind such an underlying AI strategy is that (i) in Natural Language Understanding the program is likely to propose the antecedent of an anaphor on the basis of incomplete information and (ii) since the initial constraint and preference scores are subjective, they should be regarded as uncertain facts. Thirdly, the talk will focus on a two-engine approach which was developed with a view to improving performance: the first engine searches for the antecedent using the integrated approach, whereas the second engine performs uncertainty reasoning to rate the candidates for antecedents. Fourthly, a recently developed practical approach which is knowledge-independent and which does not need parsing and semantic knowledge will be outlined.

In the last part of his talk, R. Mitkov will explain why Machine Translation adds a further dimension to the problem of anaphor resolution. He will also report on the results from two projects which he initiated and which deal with anaphor resolution in English-to-Korean and English-to-German Machine Translation.

Attacking anaphora on all fronts is a worthwhile strategy: performance is enhanced when all available means are enlisted (i.e. the two-engine approach), or a trade-off is possible between more expensive, time-consuming approaches (the integrated, uncertainty-reasoning and two-engine approaches) and a more economical, but slightly less powerful approach (the practical "knowledge-independent" approach).

Martine Smets (Sussex) 12 December 1996
A Morphological Account of French Clitics

This talk presents a language to express morphological processes encountered in languages of the world. This language supports an organization of the lexicon into three components: a lexicon of stems, a type hierarchy and a morphological component.

An illustration of how the language expresses morphological relations is given through a morphological account of French clitics, which is in agreement with the syntactic account of French clitics developed in HPSG (Miller and Sag (1996), "French clitic movement without clitics or movement").

Sue Atkins (Word Trade Centre Ltd) 14 November 1996
Bilingual Dictionaries: Past, Present and Future

The past is the print dictionary; the present is the same dictionary in electronic form with souped up access facilities; the future must be truly electronic dictionaries, compiled for the new medium. I'll take a quick look at the anatomy of the current bilingual dictionary entry as a launch pad for progress, show how a frame semantics approach to corpus analysis can be used to create new types of information, and demo in PowerPoint the hypertext entry of the bilingual dictionary of tomorrow.

[Sue Atkins is Lexicographical Consultant to Oxford University Press and was General Editor, Collins-Robert English-French Dictionary 1978 - 1987]

Bonnie Webber (University of Pennsylania) 12 November 1996
Planning and Plan Recognition in Support of Multiple Trauma Management

The TraumAID system has been designed to provide decision support in the initial definitive management of injured patients. One of the main issues in trauma management is what to do next. To support this decision, as information becomes available about a patient (wound types and locations, signs and symptoms, test results), TraumAID uses (1) rules keyed on this information to identify all management goals that are currently relevant; (2) a multi-goal planning algorithm to produce an efficient and effective plan (partially ordered sequence of actions) for realizing those goals.

While retrospective evaluation has shown TraumAID's plans to be preferred over actual management plans, there is an important question of how to use such plans for management support, to improve the conduct and quality of care. In this talk, I will describe TraumAID and how we have sought to use its plans to critique physician orders. This has required development of (1) an approach to plan recognition that admits multiple goals; (2) an approach to plan evaluation that can recognize clinically significant differences between plans; and (3) an approach to text planning designed to realize multiple related and unrelated communicative goals.

Ann Copestake (CSLI, Stanford University) 5 November 1996
Applying Natural Language Processing Techniques to Speech Prostheses

I will discuss the application of Natural Language Processing (NLP) techniques to improving speech prostheses for people with severe motor disabilities. Many people who are unable to speak because of physical disability utilize text-to-speech generators as prosthetic devices. However, users of speech prostheses very often have more general loss of motor control and, despite aids such as word prediction, inputting the text is slow and difficult. For typical users, current speech prostheses have output rates which are less than a tenth of the speed of normal speech. We are exploring various techniques which could improve rates, without sacrificing flexibility of content. I will describe the statistical word prediction techniques used in a communicator developed at CSLI and some experiments on improving prediction performance. I'll then discuss the limitations of prediction on free text, and outline work which is in progress on utilizing constrained NL generation to make more natural interactions possible.

Christof Rumpf (University of Duesseldorf) 10 October 1996
Aspectual Composition and Nonmonotonicity

The talk is about a model of aspectual composition in Japanese. The Japanese data involve aspectual verb classes and aspectual compositions on the morphological and the phrasal level. The formal framework for the model is based on typed feature structures and it is shown, that some composition problems can be solved with monotonic type inference and some can not. The latter problems are solved by the technique of "relational type composition" which can be applied with the use of phrase structure rules and lexical rules. There is no knowledge about aspectual composition presupposed but some about unfication based formalisms.