Data Science Research Group

Data science seminars

Seminars are held in Chichester 2R203 on Thursdays starting at 14:00, unless otherwise noted.

Directions to the University can be found here. The seminar room is located near label 25 on the campus map, on level 2.

Summer 2018


Machine Learning and Deep Learning in Biomedical Data: Images and Clinical Notes

Margaret Yann (Toronto)

This seminar will take place on Monday 9th July 11-12pm in the CALPS lab.

Despite major breakthroughs in deep learning, many challenges for artificial intelligence (AI) in medicine still exist. In this talk, I will discuss some research and challenges in the medical domain.

a) Images - X-ray Protein Crystallization Images

Synthesizing structural proteins is a complex, expensive and failure-prone process, where chemical conditions for each protein’s successful crystallization are unique and difficult to produce. A systematic high-throughput (HTP) approach in crystallography has been widely attempted, and large number of protein crystallization images have been generated. However, an efficient automatic analytical method for this remains a pressing need. We introduce a novel system, CrystalNet, to learn a convolutional deep neural network to efficiently and accurately analyze protein crystallization X-ray images automatically in HTP pipelines.

b) Electronic Medical Records - Clinical Notes

A number of challenges exist in analyzing unstructured free text data in electronic medical records. EMR text data generate gigabytes of free text information every year, yet are difficult to represent and model due to their high dimensionality, heterogeneity, sparsity, incompleteness and random errors. Moreover, standard NLP tools make errors when applied to clinical notes due to physicians’ use of unconventional written language in medical notes. Issues include polysemy, abbreviations, ambiguity, misspelling, variations, temporality and negation. This research presents a machine learning framework, Clinical Learning On Natural Expression (CLONE), to learn from large scale EMR databases, analyzing free text clinical notes from primary care practitioners. To demonstrate performance, we evaluate our model in a case study to identify patients with Congestive Heart Failure.


NLP in Industry: a Personal Story

Miro Batchkarov (Teebly)

This seminar will take place on Wednesday 18th July 11-12pm in the CALPS lab.

This talk describes several industrial NLP projects that I have been involved in, such as information extraction from HTML pages, postal address parsing and chat bots. I will also discuss general trends in commercial NLP. Finally, I will briefly reflect on my personal experiences as I transitioned from a PhD to industry and the lessons learned along the way.


Tense and Aspect in Distributional Semantic Vector Space Models

Thomas Kober (Edinburgh)

This seminar will take place on Wednesday 25th July 11-12pm in the CALPS lab.

Tense and aspect are two of the most important contributing factors to the meaning of a verb, e.g. determining the temporal extent described by a predication as well as the entailments the verb gives rise to. For example while a verb phrase describing an event such as "Thomas is visiting Brighton" entails the change of state of "Thomas being in Brighton", the state of "being in Brighton" does not entail a "visit to Brighton". The reasoning becomes more complex when different tenses are involved, where "Thomas has arrived in Brighton" entails "Thomas is in Brighton", whereas "Thomas will arrive in Brighton" does not.

Distributional semantic word representations are an ubiquitous part for a number of NLP tasks such as Sentiment Analysis, Question Answering, Recognising Textual Entailment, Machine Translation, Parsing, etc. While their capacity for representing content words such as adjectives, nouns and verbs is well established, their ability to encode the semantics of closed class words has received considerably less attention.

In this talk I will show how composition can be used to leverage distributional representations for closed class words such as auxiliaries, prepositions and pronouns to model tense and aspect of verbs in context. I will furthermore analyse how and why closed class words are effective at disambiguating fine-grained distinctions in verb meaning. Lastly, I will demonstrate that a distributional semantic vector space model is able to capture a substantial amount of temporality in a novel tensed entailment task.


Deep Embedder: A journey from an Average Word Vector to an RNN

Aleksander Savkov (Babylon Health)

In this talk I will describe a project from industry where we learn a strong sentence embedding model from unlabelled data.

The archive of previous seminars goes back to late 1996.