MSc Data Science (Earth and Environmental Analytics) / Course details
Year of entry: 2025
- View tabs
- View full page
Course unit details:
Understanding Data and their Environment
Unit code | DATA71011 |
---|---|
Credit rating | 15 |
Unit level | FHEQ level 7 – master's degree or fourth year of an integrated master's degree |
Teaching period(s) | Semester 1 |
Available as a free choice unit? | No |
Overview
This module is a combination of technical and non-technical topics all related to critical externalities to the data analytics process.
The course covers a suite of topics rated to the representation and processing and pre-processing of data: metadata, paradata ,data provenance: understanding data quality and the impact on inference; cleaning data; edit and imputation models, the basics of data linkage/integration and data visualisation.
Aims
The unit aims to:
• Develop an awareness of the issues around the use of data in research.
• Develop fundamental skills in data pre-processing.
Learning outcomes
Students should be able to:
• Demonstrate a basic understanding of metadata, paradata and data provenance
• Be able to prepare a dataset for analysis
• Make informed decisions about linkage/integration of data and carry out a basic data linkage.
• Be able to produce basic data visualisations.
Teaching and learning methods
Lectures will introduce specific ideas in relation to data management, the ethics and disclosure of data and linkage in relation to research. Interactive exercises will involve a mixture of solo and group work. Laptop based practicals will allow the students to apply those ideas and to manage data and be able to make informed decisions about linkage/integration of data and to apply anonymisation processes to data.
Assessment methods
Group provenance exercise (600 words and code) 20%
Online test on information about data and reproducibility (1 hour) 20%
Group presentation of analysis plan (video) (5 minutes) 10%
Pre-processing and analysis report (1,500 words) 50%
Recommended reading
Christen, P. (2012). Data matching: concepts and techniques for record linkage,
entity resolution, and duplicate detection. Springer Science & Business Media
García S., Luengo, J., & Herrera F. (2015). Data preprocessing in data mining.
Springer
Moreau, L., & Groth, P. (2013) Provenance: An Introduction to PROV. Available at https://tinyurl.com/PROV-BOOK [accessed 25/9/2019]
Teaching staff
Staff member | Role |
---|---|
Pradyumn Shukla | Unit coordinator |
Mark Elliot | Unit coordinator |
Stian Soiland-Reyes | Unit coordinator |
Nuno Pinto | Unit coordinator |