Clinical Data Science / Course details

Year of entry: 2025

View tabs
View full page

Course unit details:
Data Engineering

Course unit fact file
Unit code	IIDS69011
Credit rating	15
Unit level	FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s)	Semester 1
Available as a free choice unit?	No

Overview

Clinical Data Scientists need to be able to create data pipelines and merge data sets from different sources before it can be used for onwards analysis such as machine learning. They will also be required to 'wrangle' (pre-process) data into different formats and sub-sets for subsequent analysis. This includes an understanding of structured and unstructured data formats (e.g. tabular form, JSON, XML etc.), how data is modelled in various commonly used databases systems as well as an awareness of data/cyber security. They will be required to access data in a variety of formats and engineer pipelines for data analysis whilst adhering to wider concepts of data protection/privacy regulations and information governance. This module introduces these concepts with applied examples.

Aims

The unit aims to:

Give students hands on experience applying tools and techniques used to access data in different common formats, how to transform and combine this data into a format suitable for subsequent data analysis (e.g. application of statistical methods/machine learning algorithms) by creating data processing pipelines
Experience using, accessing and querying data in different database storage systems (e.g. relational and NoSQL database systems)
Understand the importance of data security issues both from a technical and legislative perspective
Explore the benefits and challenges with accessing health/clinical data
Understand and practice data cleaning and understand the impact of data provenance and altering data (e.g. variable encoding, missing values, inconstantly entered data and data validation)

Learning outcomes

Learning outcomes

On completion of this unit, succesful students should be able too:

Category of outcome

Students should be able to:

A: Knowledge and understanding

LO1: Describe the difference between structured and un-structured data citing relevant examples of each

LO2: Discuss the consequences of cyber-attacks/data breaches and mitigation strategies

LO3: Discuss principles involved in data sharing and information governance with reference to appropriate guidelines and legislation

LO4: Critique common data standards depending on intended usage

LO5: Explain the challenges and opportunities of big data and approaches for processing such data

B: Intellectual Skills

This unit will cover the following indicative content:

Fundamental data types and structures
Structured and unstructured data
The fundamentals of using Python for data science and associated libraries/modules
How data is modelled in different database systems
Querying and filtering data
Representing data using dataframes
Data cleaning (imputing missing values, encoding variables,
Data transformations (wide/long, feature engineering)
Combining datasets (data linkage)
Data sharing agreements/plans
Data and patients
Data representation in diagrams (e.g. ERM, Data flow and UML)
Common data standards
The unit will be delivered online making use of workshops, lectures, labs and self-directed learning material delivered through interactive digital (Jupyter) notebooks to impart core knowledge and skills. A series of synchronous labs using a variety of datasets and formats will be used to foster group work and collaborative working with problem based learning. Case-studies and data will be drawn from The University of Manchester and its affiliates as well as NHS and open-source projects where possible.

Syllabus

This unit will cover the following indicative content:

Fundamental data types and structures
Structured and unstructured data
The fundamentals of using Python for data science and associated libraries/modules
How data is modelled in different database systems
Querying and filtering data
Representing data using dataframes
Data cleaning (imputing missing values, encoding variables,
Data transformations (wide/long, feature engineering)
Combining datasets (data linkage)
Data sharing agreements/plans
Data and patients
Data representation in diagrams (e.g. ERM, Data flow and UML)
Common data standards
The unit will be delivered online making use of workshops, lectures, labs and self-directed learning material delivered through interactive digital (Jupyter) notebooks to impart core knowledge and skills. A series of synchronous labs using a variety of datasets and formats will be used to foster group work and collaborative working with problem based learning. Case-studies and data will be drawn from The University of Manchester and its affiliates as well as NHS and open-source projects where possible

Assessment methods

Assessment task	Length	Weighting within unit
Data Management Plan	You will create an authentic data management plan for a fictional scenario or real-world project that you would like to implement in your organisation.	100%

Feedback methods

Formative assessment and feedback to students is a key feature of the online learning materials for this unit and is provided through self-directed learning activities in the interactive notebooks.

Study hours

Independent study hours
Independent study	150

Teaching staff

Staff member	Role
Alan Davies	Unit coordinator
Iliada Eleftheriou	Unit coordinator

Return to course details