Who’s looking at your medical records?

The Environment and Health research theme is finding ways to enhance lifelong health and well-being by promoting illness prevention and improving the management of disease.

Patient record folders

Medical records

A multidisciplinary team of researchers from the Brighton and Sussex Medical School and the School of Informatics at the University of Sussex is finding new ways to extract greater amounts of information from ‘free text’ in patients’ medical records, with a view to developing strategies to improve healthcare systems.


Evaluating patient records for free-text content: how can it inform and improve patient diagnosis, referral and treatment?

Primary-care patient-record databases provide a unique and valuable resource for research into the epidemiology of disease, and how patient diagnoses, referrals and treatment are handled within our healthcare system.

Patient information is recorded electronically by GPs using a system of predefined codes for specific symptoms, diagnoses, prescribed treatments, etc.

Electronic records also contain a significant amount of additional information, written by the GP in the free text of patient notes.

Currently, most health service research considers only coded data, ignoring information in the free text, which, for a variety of reasons is much harder to access. Recent research by a multidisciplinary team at Sussex has focused on making data concealed in the free text more accessible and usable for research purposes.

Analysing free text in electronic patient records is difficult.

Firstly, there is the arbitrary way in which the information is recorded. A diagnosis may not be coded until long after a patient has actually been diagnosed, referred and treated, or it may not be coded at all for fear of ‘labelling’ the patient, making the timelines of diagnoses for a specific condition difficult to construct. And in order for patients’ free-text records to be made available to researchers, the data has to be manually ‘anonymised’ to protect patient identity and confidentiality – a process that is both costly and time consuming.

The research team aims to understand what determines the balance between free text and coded data and the completeness of recording in primary care, how variations in this balance affect data accessibility for health-service research, how information about symptoms and diseases can be extracted from free text, and how this information can be best presented to health-service researchers.

Natural language processing and computer science techniques are used to extract information more efficiently and rapidly than traditional manual processes, thereby making data more readily available and usable for researchers.

The team has developed algorithms and machinelearning techniques to automatically search patient-record databases for recognisable information, such as specific symptoms or related synonyms and abbreviations, which is either not currently available to researchers or is difficult to extract.

The beauty of automated searching is that, in addition to extracting information more rapidly, it does not need to be run on anonymised text, provided the results can be guaranteed to protect confidential, patient-identifying information.

Field studies are also being conducted to understand how and why GPs record data as they do, using simulated and real patients to learn how primary-care staff enter data and use electronic health records.

For a disease that relies on speed and hospital referral for successful treatment, free text may prove invaluable for accurate dating of diagnoses and referrals, and for identifying misclassified cases.

The ability to extract greater amounts of information from electronic patient records will hopefully provide a better understanding of how well particular diseases are recorded, diagnosed and treated, and inform improvements to our healthcare system.

Rosemary's perspective

"The great thing about working with electronic health records is that there is so much scope for really interesting research. Extracting and interpreting information from this “real-life” data can be difficult, but all the more rewarding when we can achieve it.

"So after two years of painstaking work by the team, (including three clinical annotators who read through and annotated all 6,141 text records, we were thrilled to find that we can extract a large percentage of the extra information on ovarian cancer symptoms, such as abdominal distension or pain, using relatively simple techniques and without the need for anonymisation.

"We believe that our methods will allow many more users to have access to the information in the free-text records and that this will make a real impact for research.

"I have just embarked on a new and exciting project with Dataline Software and the GPRD. The project will provide an online portal and toolset to search, visualise and ultimately provide a comprehensive list of practices and patients for clinical trials across the UK. My role (assisted by Natalia Beloff, Informatics) is to produce quality scores for each practice using statistical detective work."