Information regarding our 2023/24 admissions cycle

Our 2023/24 postgraduate taught admissions cycle will open on Monday, 10 October. For most programmes, the application form will not open until this date.

MSc Data Science (Applied Urban Analytics) / Course details

Year of entry: 2023

Course unit details:
Understanding Data and their Environment

Course unit fact file
Unit code DATA71011
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Offered by
Available as a free choice unit? No

Overview

This module is a combination of technical and non-technical topics all related to critical externalities to the data analytics process. The primary aim of the module is to demonstrate that data science cannot be carried out in a vacuum that a whole range of extrinsic considerations affect our ability to carry out the research that we wish to carry out. However, appropriate management of these externalities can lead to higher quality as well more responsible research. The course will have 4 components:

1. Ethics and the law: data protection, anonymisation, statistical disclosure, understanding consent.

2.  Information about Data: metadata and paradata. Provenance and data generating processes; Issues about data quality and the impact on inference; accessing and finding data.

3. Pre-Processing: Understanding data quality and divergence and the impact on inference; Cleaning data; Editing and imputation models. 

4.  Combining and enhancing data: Basics of data linkage/integration.

Aims

The unit aims to:

  • Develop a basic understanding of the technical processes of anonymisation, disclosure control and data linkage.
  • Develop an awareness of the issues around the use of data in research.
  • Develop fundamental skills in data husbandry.

Learning outcomes

Intended Learning Outcomes

Students should be able to:

  • Understand the ethical issues surrounding the use of data in research.
  • Understand the concepts and technical vocabulary of anonymisation and statistical disclosure.
  • Demonstrate a basic understanding of data provenance
  • Make informed decisions about linkage/integration of data and carry out a basic data linkage.
  • Conduct a basic anonymisation process with a dataset.
  • Identify an appropriate collection of data sources for a project and to identify the issues in using those data sources.

Teaching and learning methods

Lectures will introduce specific ideas in relation to data management, the ethics and disclosure of data and linkage in relation to research. Interactive exercises will involve a mixture of solo and group work. Computer based practicals will allow the students to apply those ideas and to manage data and be able to make informed decisions about linkage/integration of data and to apply anonymisation processes to data.

Assessment methods

Assessment task

Length

How and when feedback is provided

Weighting

Essay

2000 words

With the marking

35%

Provenance Exercise

500 words + code

With the marking

  •            Christen, P. (2012). Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer Science & Business Media.
  •             Duncan, G. T., Elliot, M., & Salazar-González, J. J. (2011). Statistical Confidentiality. Springer New York.
  •             Elliot, M., Mackey, E., O'Hara, K., & Tudor, C. (2016). The Anonymisation Decision-Making Framework. UKAN publications; Manchester.
  •           García S., Luengo, J., & Herrera F. (2015). Data preprocessing in data mining. Springer
  •            Han, J., Pei, J. & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
  •           Moreau, L., & Groth, P. (2013) Provenance: An Introduction to PROV. Available at https://tinyurl.com/PROV-BOOK [accessed 25/9/2019]
  •          Runkler, T. A. (2012). Data Analytics: Models and Algorithms for Intelligent Data Analysis, Springer.

Teaching staff

Staff member Role
Mark Elliot Unit coordinator

Return to course details