Research to unlock hidden health data in GPs’ notes

Patient files

Researchers at the University of Sussex and Brighton and Sussex Medical School have begun work to analyse doctors’ notes for data that could revolutionise health care and treatments for conditions such as rheumatoid arthritis.

The unique collaboration brings together the public health data research of Professor of Primary Care Epidemiology Jackie Cassell (BSMS) and the natural language processing research of Professor of Computational Linguistics John Carroll (Informatics, University of Sussex), who leads the project.  

The GP data project will contribute to the work of the Centre for the Improvement of Population Health through e-Records Research (CIPHER).1

GPs usually log details of each patient consultation – symptoms, medical histories and other information – using a series of codes. But a lot of important information that could be used to benefit public health is hidden away in uncoded information (notes written by doctors, called “free text”), which to date remains untapped in public health research.

The aim of the collaboration is to analyse thousands of computerised, anonymised GPs’ notes from the past 15 years and then to create software that will allow researchers to extract “big data” to reveal patterns of disease, diagnosis, treatment and prognosis across the UK.2

Professor Carroll explains: “Databases of electronic patient records are used extensively in biomedical research, but little attention is paid to the information contained in other text, such as notes or letters, often using shorthand terms and individualised language that is difficult to manage. Our research offers something unique – combining medical knowledge, linguistic processing and computer coding software to help us mine medical records and extract and correlate information on disease, health and social or geographic factors affecting public health.”

Rheumatology and inflammation is a research specialism in BSMS, and one of the areas the team will focus on is rheumatoid arthritis, a chronic progressive condition where new treatments that can reduce joint damage are developing very quickly. While many rheumatoid arthritis patients will see a specialist at some stage, most remain under the care of their GPs.

Professor Cassell explains how the big data that the team collects from the records of rheumatoid arthritis patients could be of real benefit in the future: “When a new treatment becomes available, perhaps through a research trial, it is important that specialists can quickly find all the people who might benefit. However at the moment there isn’t a simple way of checking through GP records to see who could be invited to try the new medication.

“Some people with RA will be seeing their GP, but won’t have a code that clearly identifies that they have RA. But a systematic survey of free text could identify those patients by, for example, looking for the medications they are taking and noting that they complained of sore hands and feet, which might flag up that the patient has early RA.

“The information could then be used to alert GPs and patients that a new trial or medication is available.

“This is the innovative role of Sussex – to ‘read’ automatically the free text that doctors write and to find markers that the person has RA, even where this has not been coded.”

The Sussex/BSMS collaboration, which is enlisting medical students and computer science postgraduates to annotate anonymised GPs’ notes and build the software, follows on from the work of an earlier collaborative project between the universities of Sussex, Brighton and BSMS, called PREP3, led by Professor Cassell.

Aleksandar Savkov is a doctoral student working with Professors Cassell and Carroll, while medical students Lucy McCabe and Nancy O’Neill who also took part have recently qualified as doctors. 

Professor Cassell says: “At the end of the five years, we expect to have a better understanding of how we can use medical big data in the real world to provide alerts and information that will benefit patients, and to be able to feed this back to patients and their doctors and nurses.”

Notes for Editors

1 The University of Sussex and BSMS research into GP records forms part of the work of the University of Swansea-based Centre for the Improvement of Population Health through e-Records Research (CIPHER), one of four UK centres that comprise the newly endowed Farr Institute of health informatics, which was awarded £20m by the Medical Research Council to support the safe use of patient and research data for medical research across all diseases.

2 The data sets being used by the University of Sussex and BSMS come from the Clinical Practice Research Datalink, a £60 million service recently announced by the Medicines and Healthcare Products Regulatory Agency and the National Institute for Health Research. Patient records are anonymised so that individual patients cannot be identified and the use of any resulting data is strictly regulated.

3 PREP – the Patient Records Enhancement Programme: Led by Professor Jackie Cassell, PREP brought together research knowledge from the Universities of Sussex and Brighton and BSMS to explore the potential of information concealed within free text to answer key questions in biomedical, clinical and public health research.

Professor John Carroll is Professor of Computational Linguistics at the University of Sussex. He works in the area of intelligent computer processing of human language (natural language processing, or NLP). His current research is concerned with: automatic natural language parsing (syntactic analysis), acquisition of information about word usage and meaning from text, sentiment analysis, clinical text mining, and other applications of natural language processing to real-world tasks. Professor Carroll is head of the Department of Informatics and Deputy Head of the School of Engineering and Informatics.

Professor Jackie Cassell is based at Brighton and Sussex Medical School (BSMS). She is a clinically qualified consultant in Public Health and in sexually transmitted infections. In recent years she has developed a wide-ranging interest in electronic patient records, and currently leads a Wellcome Trust-funded programme "The Ergonomics of Electronic Patient Records", an interdisciplinary research project involving academics from the universities of Sussex and Brighton and BSMS,  which develops methodologies for understanding and exploiting free text to enhance the utility of primary care electronic patient records.. Jackie has also published widely in the field of sexually transmitted infections, focusing on primary care and the impact of delayed care on disease transmission. She leads an NIHR-funded trial of partner notification for sexually transmitted infections in primary care and other community settings, working closely with the Health Protection Agency, University College London, and a GUM clinic in Eastbourne, where she holds honorary posts. Jackie is also the editor in chief of the journal 'Sexually Transmitted Infections'.

University of Sussex Press office contacts: Maggie Clune and Jacqui Bealing. Tel: 01273 678 888. Email:

View press releases online at:

Last updated: Monday, 23 September 2013