The Data Science Research Group addresses the challenges associated with deriving knowledge from large heterogeneous datasets - with a particular focus on machine learning and natural language processing.
The group (based in the Department of Informatics) studies methods for gathering large datasets, extracting information from the data, analysing the data in order to find patterns, and visualising the results of this analysis.
In the area of machine learning, we are developing learning methods addressing dynamic evolution of data over long time periods, inconsistencies of information between or within data sources, and integration of domain-specific knowledge with data. We are applying these methods to image analytics, video analytics and time series analysis.
In the area of natural language processing, we are developing techniques for understanding people’s social media activity, investigating healthcare issues by analysing electronic patient records, acquiring information about word meaning from raw text, and searching and classifying large collections of documents. Also in this area, we are devising computational models to study the ways in which people, particularly people with autism, use language to communicate.
Here is a sample of recent research highlights:
Joe Taylor (Data Science group PhD student) is lead author of a paper on feature selection at IJCAI 2016, the leading international AI conference:
- Learning using Unselected Features (LUFe). Joseph Taylor, Viktoriia Sharmanska, Kristian Kersting, David Weir and Novi Quadrianto.
The machine learning lab is organising the First Workshop on Human is More Than a Labeler (BeyondLabeler), co-located with IJCAI 2016 in New York City. The invited speakers include Vladimir Vapnik, Alan Fern, Rogerio Feris, Michael Littman, and Rich Caruana.
Novi Quadrianto and Viktoriia Sharmanska have had two papers accepted for presentation at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 26 June - 1 July in Las Vegas, USA. CVPR is the only conference in the Google Scholar 100 most-cited venues, alongside renowned journals such as Nature, Science, and Cell. The accepted papers are:
- Learning from the Mistakes of Others: Matching Errors in Cross Dataset Learning. Viktoriia Sharmanska and Novi Quadrianto.
- Ambiguity Helps: Classification with Disagreements in Crowdsourced Annotations. Viktoriia Sharmanska, Daniel Hernández-Lobato, Jose Miguel Hernández-Lobato, and Novi Quadrianto.