Computing

Data Science Research Methods

Module code: 970G5
Level 7 (Masters)
15 credits in autumn semester
Teaching method: Lecture, Laboratory
Assessment modes: Coursework

This module will provide you with the practical tools and techniques required to build, analyse and interpret 'big data' datasets.

It will cover all aspects of the Data Science process including collection, munging or wrangling, cleaning, exploratory data analysis, visualization, statistical inference and model building and implications for applications in the real world.

During the module, you'll be taught how to scrape data from the Internet, develop and test hypotheses, use principal component analysis (PCA) to reduce dimensionality, prepare actionable plans and present your findings.

In the laboratory, you'll develop your Python programming skills and be introduced to a number of fundamental standard Python libraries/toolkits for Data Scientists including NumPy, SciPy, PANDAS and SCIKIT-Learn.

In these sessions and your coursework, you'll work with real-world datasets and apply the techniques covered in lectures to that data.

Module learning outcomes

  • Analyse real-world 'big data' datasets using appropriate state of the art tools and techniques.
  • Design testable hypotheses and apply suitable experimental methods to determine whether those hypotheses are supported by the data.
  • Evaluate the applicability of different tools and techniques for data analysis and visualisation in different scenarios.
  • Summarise an analysis of big data and apply data visualisation tools and techniques to present data in an appropriate format