Computing

Data Science Research Methods

Module code: 970G5
Level 7 (Masters)
15 credits in autumn teaching
Teaching method: Lecture, Laboratory
Assessment modes: Coursework

This module provides you with the practical tools and techniques required to build, analyse and interpret ‘big data’ datasets. It covers all aspects of the data science process including:

  • collection
  • munging or wrangling
  • cleaning
  • exploratory data analysis
  • visualisation
  • statistical inference and model building
  • implications for applications in the real world.

In the laboratory, you'll develop your Python programming skills and are introduced to fundamental standard Python libraries/toolkits for data scientists. These include NumPy, SciPy, PANDAS and SCIKIT-Learn. In these sessions and coursework, you'll work with real-world datasets and apply the techniques covered in lectures to that data.

During the module, you're taught how to:

  • scrape data from the internet
  • develop and test hypotheses
  • use principal component analysis (PCA) to reduce dimensionality
  • prepare actionable plans and present your findings.

Module learning outcomes

  • Analyse real-world 'big data' datasets using appropriate state of the art tools and techniques.
  • Design testable hypotheses and apply suitable experimental methods to determine whether those hypotheses are supported by the data.
  • Evaluate the applicability of different tools and techniques for data analysis and visualisation in different scenarios.
  • Summarise an analysis of big data and apply data visualisation tools and techniques to present data in an appropriate format