Clinical Data Science / Course details

Year of entry: 2025

Course unit details:
Data Engineering

Course unit fact file
Unit code IIDS69011
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 1
Available as a free choice unit? No

Overview

Clinical Data Scientists need to be able to create data pipelines and merge data sets from different sources before it can be used for onwards analysis such as machine learning. They will also be required to 'wrangle' (pre-process) data into different formats and sub-sets for subsequent analysis. This includes an understanding of structured and unstructured data formats (e.g. tabular form, JSON, XML etc.), how data is modelled in various commonly used databases systems as well as an awareness of data/cyber security. They will be required to access data in a variety of formats and engineer pipelines for data analysis whilst adhering to wider concepts of data protection/privacy regulations and information governance. This module introduces these concepts with applied examples.

Aims

The unit aims to:

  • Give students hands on experience applying tools and techniques used to access data in different common formats, how to transform and combine this data into a format suitable for subsequent data analysis (e.g. application of statistical methods/machine learning algorithms) by creating data processing pipelines
  • Experience using, accessing and querying data in different database storage systems (e.g. relational and NoSQL database systems)
  • Understand the importance of data security issues both from a technical and legislative perspective
  • Explore the benefits and challenges with accessing health/clinical data
  • Understand and practice data cleaning and understand the impact of data provenance and altering data (e.g. variable encoding, missing values, inconstantly entered data and data validation)

Learning outcomes

Learning outcomes

On completion of this unit, succesful students should be able too:

Category of outcome Students should be able to:
A:  Knowledge and understanding

LO1: Describe the difference between structured and un-structured data citing relevant examples of each

LO2: Discuss the consequences of cyber-attacks/data breaches and mitigation strategies

LO3: Discuss principles involved in data sharing and information governance with reference to appropriate guidelines and legislation

LO4: Critique common data standards depending on intended usage

LO5: Explain the challenges and opportunities of big data and approaches for processing such data

B: Intellectual Skills

This unit will cover the following indicative content:

  • Fundamental data types and structures
  • Structured and unstructured data
  • The fundamentals of using Python for data science and associated libraries/modules 
  • How data is modelled in different database systems
  • Querying and filtering data
  • Representing data using dataframes
  • Data cleaning (imputing missing values, encoding variables,
  • Data transformations (wide/long, feature engineering)
  • Combining datasets (data linkage)
  • Data sharing agreements/plans
  • Data and patients
  • Data representation in diagrams (e.g. ERM, Data flow and UML)
  • Common data standards
  • The unit will be delivered online making use of workshops, lectures, labs and self-directed learning material delivered through interactive digital (Jupyter) notebooks to impart core knowledge and skills. A series of synchronous labs using a variety of datasets and formats will be used to foster group work and collaborative working with problem based learning. Case-studies and data will be drawn from The University of Manchester and its affiliates as well as NHS and open-source projects where possible.

 

 

 

Syllabus

This unit will cover the following indicative content:

  • Fundamental data types and structures
  • Structured and unstructured data
  • The fundamentals of using Python for data science and associated libraries/modules 
  • How data is modelled in different database systems
  • Querying and filtering data
  • Representing data using dataframes
  • Data cleaning (imputing missing values, encoding variables,
  • Data transformations (wide/long, feature engineering)
  • Combining datasets (data linkage)
  • Data sharing agreements/plans
  • Data and patients
  • Data representation in diagrams (e.g. ERM, Data flow and UML)
  • Common data standards
  • The unit will be delivered online making use of workshops, lectures, labs and self-directed learning material delivered through interactive digital (Jupyter) notebooks to impart core knowledge and skills. A series of synchronous labs using a variety of datasets and formats will be used to foster group work and collaborative working with problem based learning. Case-studies and data will be drawn from The University of Manchester and its affiliates as well as NHS and open-source projects where possible

Assessment methods

Assessment task

Length

Weighting within unit

Data Management Plan

You will create an authentic data management plan for a fictional scenario or real-world project that you would like to implement in your organisation.

100%

Feedback methods

Formative assessment and feedback to students is a key feature of the online learning materials for this unit and is provided through self-directed learning activities in the interactive notebooks. 

 

 

Recommended reading

  • McKinney, W (2017) Python for Data Analysis. Beijing: O'Reilly
  • Molin, S (2019) Hands-On Data Analysis with Pandas. Birmingham: Packt
  • Medium (2021) Towards data science: A Medium publication sharing concepts, ideas and codes. https://towardsdatascience.com/about

Study hours

Independent study hours
Independent study 150

Teaching staff

Staff member Role
Alan Davies Unit coordinator
Iliada Eleftheriou Unit coordinator

Return to course details