Members of the group have led the following projects (listed in reverse chronological order).
SWAT (Semantic Web Authoring Tool)
The project aims to open up the semantic web to a wide audience through novel techniques that allow viewing and editing of semantic web representation languages in ordinary natural language, as opposed to the methods currently used, such as source coding or graphical interfaces, which require significant training. The project is developing tools that allow subject-matter experts to edit and query metadata on the Semantic Web through a reliable natural language interface.
PI (Sussex): Donia Scott
Ranking Word Senses for Disambiguation: Models and Applications
The most accurate techniques for word sense disambiguation to date are those which are trained on text in which each word has been manually annotated with its intended meaning. A major shortcoming of these methods, though, is that accuracy is strongly correlated with the quantity of training data available, and this is in short supply because its production is very labour-intensive. In this project we developed novel ways of estimating the frequency distributions of senses of words from raw (unannotated) text. This was a joint project with Informatics, University of Edinburgh.
PIs (Sussex): Diana McCarthy, John Carroll
Research Fellow: Rob Koeling
PI (Edinburgh): Mirella Lapata
COGENT: Controlled Generation of Text
With current NLP technology, embedding natural language generation into applications involves hand-crafting and special-purpose tuning by experts which is non-portable, non-scaleable, time-consuming and expensive. In this project, we investigated reflective techniques for controlling wide-coverage generation effectively. This was a joint project with the University of Brighton.
PIs (Sussex): David Weir, John Carroll
Research Fellow: Daniel Paiva
DPhil student: Eva Esteve Ferrer
PI (Brighton): Roger Evans
Vicarious Learning and Case-based Teaching of Clinical Reasoning Skills
The project aimed to investigate the effectiveness vicarious learning in education. Vicarious learning is a term used to refer to learning through observing others. This research sought to test an interactive vicarious learning package on speech and language students. Being taught can be used to help students who have problems in making correct clinical diagnoses. A Vicarious learning system using virtual patients was found to help students reflect on the process of diagnosing patients better than traditional methods of teaching. Students observing groups of students and tutors discussing cases were more confident in their subsequent diagnoses. The project was funded under Phase III of the ESRC's Teaching and Learning Research Programme (TLRP).
PIs: Richard Cox, John Lee (Edinburgh), Julie Morris (Newcastle)
Research Fellows: Jianxiong Pang, Susan Rabold, Barbara Howarth
ROLLOUT — Innovative Representations for Scheduling for Quality and Training
The ROLLOUT project’s starting point was the fundamental theoretical idea in Cognitive Science that the representations used by problem solvers and learners will dramatically influence how effectively they do complex tasks and how easily they learn difficult technical topics. So knowing how to design effective representations will transform the nature of information intensive and conceptually demanding tasks. The project has extended and successfully evaluated a theory of representational systems design by applying it to the creation of graphical computer user-interface displays for the challenging problem of bakery production scheduling. ROLLOUT was the novel production planning and scheduling system created. Laboratory experiments show that ROLLOUT provides superior support for perception, reasoning and problem solving compared to the conventional representations used. Trials in real working bakeries have shown that ROLLOUT effectively supports novice and expert bakery managers in their work, allowing schedules with fewer production problems to be dynamically created during the fray of the production environment.
Project Manager: Linda Young (CCFRA)
PIs: Peter Cheng, John Wilson (Nottingham), Stan Cauvain (CCFRA)
Research Fellows: Rossano Barone, Nikoletta Pappa
Industrial collaborators: ten bakeries, supermarkets, equipment manufacturers
Representational Design Principles to Humanise Automated Scheduling Systems
The project aimed to evaluate and refine principles of Representational Epistemology (REEP) for the design of interactive graphical interfaces for knowledge rich scheduling problems and investigate the scope for the integration of the power of automated systems for complex scheduling tasks with the flexibility and creativity of humans. The project applied REEP to the design of representations for two complex scheduling domains: examination timetabling; personnel rostering. As well as developing interactive graphical interfaces to support visualisation and manual manipulation of schedules; a heuristic tools box was developed to investigate the scope for semi-automated construction and refinement of schedules. Evaluation studies suggested that the new representations substantially enhanced the quality of solutions generated by users by supporting the use of more meaningful strategies for problem solving compared to conventional displays and that it was possible to integrate human knowledge, flexibility and ability to learn, with the computational power and capacity of automated systems. The project was funded under the PACCIT programme ESRC / EPSRC.
PIs: Peter Cheng, Peter Cowling (Bradford), Edmond Burke (Nottingham), B McCullum (Belfast)
Research Fellows: Rossano Barone, Nikoletta Pappa, Samad Ahamadi (Nottingham)
Optime Ltd.
Natural Habitats
The pervasive computing environment of the future will provide a wide variety of networked services. The value of such services will be greatly enhanced if the user is able to compose them -- link them up in ways that are tailored to their own particular environment. This project investigated how NLP techniques can help make service composition a possibility for non-technical users, focusing on the development of an interactive service composition tool that uses a natural language interface.
PIs: David Weir, Bill Keller, Ian Wakeman
Research Fellows: Julie Weeds, Tim Owen
DPhil students: Thom Heslop, James Dowdall
MEANING: Developing Multilingual Web-scale Language Technologies
In this project we collected and analysed language data from the WWW on a large scale, in order to build more comprehensive multilingual lexical knowledge bases to support improved word sense disambiguation.
PI: John Carroll
Research Fellows: Rob Koeling, Diana McCarthy
DPhil student: Xinglong Wang
DEEP THOUGHT: Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction
This project investigated methods for combining robust shallow methods for language analysis with deep semantic processing. The approach was demonstrated in business intelligence, automated email processing and document production support applications.
PI: John Carroll
Research Fellow: Alex Fang
Visiting researchers: Stephan Oepen, Naoki Yoshinaga
RASP: Robust Accurate Statistical Parsing
This project was concerned with improving the accuracy and robustness of syntactic parsers. Particular areas worked on were automated grammar and lexicon induction, parser evaluation, and statistical models of disambiguation.
PI: John Carroll
Research Fellow: Diana McCarthy
DPhil student: Mark McLauchlan
LUCY
The project, sponsored by ESRC, developed an electronic database of structurally analysed modern written English, including not only the "polished" writing of published books and magazines but the writing of young children and teenagers.
PI: Geoffrey Sampson
Research Fellows: Anna Babarczy, Alan Morris
PSET: Practical Simplification of English Text
The project built a prototype system which took in English newspaper text across the WWW, and output a simplified version with broadly similar meaning; intended users were people suffering from aphasia which impairs their comprehension of written English.
PI: John Carroll
Research Fellows: Diana McCarthy, Guido Minnen
DPhil student: Darren Pearce
CHRISTINE
The CHRISTINE Corpus comprises a socially-representative annotated sample of current spontaneous speech, applying the annotation standards devised in the SUSANNE project (see below) to create resources for studying structure in present-day British language. It includes various extensions of the annotation scheme to identify the many structural features particular to speech. The Corpus is freely available.
PI: Geoffrey Sampson
LEXSYS: Analysis of Naturally-occurring English Text with Stochastic Lexicalized Grammars
The project developed a robust wide-coverage parsing system for English text, exploiting a combination of statistical techniques involving online corpora, inheritance hierarchies for imposing structure on NLP data, and lexicalised grammars.
PIs: David Weir, John Carroll
POLYLEX
The project developed an inheritance-based trilingual lexicon for the core vocabulary of Dutch, English and German using inheritance networks to share information across the languages at all levels of linguistic description.
PIs: Gerald Gazdar, Lynne Cahill
SPARKLE: Shallow Parsing for Acquisition of Lexical Knowledge
The project developed shallow parsing technology in English together with corpus-based lexical acquisition techniques, for deployment by collaborators in prototype multilingual information retrieval and speech dialogue systems.
PI: John Carroll
SUSANNE: Surface and Underlying Structural Analysis of Natural English
The project designed an annotation scheme for English, and produced a 130,000-word corpus of written (American) English annotated in accordance with the scheme. The SUSANNE Corpus is freely available without formalities for use by researchers anywhere.
PI: Geoffrey Sampson
POETIC: POrtable Extendable Traffic Information Collator
The POETIC project involved the development of a research prototype software system, based on natural language processing and expert system technology, which accepts 'live' police reports about traffic incidents, recognises information of relevance to other motorists, formulates suitable advisory messages, and coordinates message delivery to motorists via media such as paging, cellular radio, and the Radio Data System.
